Pandas binary column values according to index
I currently have a DataFrame that contains the age of the population and the frequency of those ages, for example:
freq
27 103
28 43
29 13
... ...
78 20
79 13
Age is the index of the DataFrame. I would like to do some Pandas magic so that I get the DataFrame bit like this:
freq
(20, 30] 308
(30, 40] 111
(40, 50] 85
(50, 60] 58
(60, 70] 63
(70, 80] 101
Thus, the index now consists of age intervals rather than individual ages, and the frequencies are summed accordingly. How can i do this?
+3
jerry maks
source
to share
1 answer
You can use groupby
after use cut
to dump the index of the DataFrame. For example:
>>> df = pd.DataFrame({'freq': [2, 3, 5, 7, 11, 13]},
index=[22, 29, 30, 31,25, 42])
>>> df
freq
22 2
29 3
30 5
31 7
25 11
42 13
Then:
>>> df.groupby(pd.cut(df.index, np.arange(20, 60, 10))).sum()
freq
(20, 30] 21
(30, 40] 7
(40, 50] 13
np.arange(20, 60, 10)
defines the cells to be used; you can adjust them according to the max / min values in the "freq" column.
+6
Alex Riley
source
to share