How can the groupby object calculate count> 10 and the variance via np. var?
There is an object groupby
formed from a DataFrame. For it, you need to calculate the variance by categories in which the number of observations is >= 10
.
Source data:
megretrans_new = megretrans_otr.groupby(['new'])['amount'] #так был создан объект
I try this, but it's the wrong code:
megretrans_new.agg(['count' > 10, np.var])
1
1 answers
Example:
Creating a frame for the demo:
df = pd.DataFrame({
"new": np.random.choice(list("abcde"), 100, p=[.3,.25,.2,.2,.05]),
"amount": np.random.rand(100)*1000})
In [48]: df
Out[48]:
new amount
0 a 469.617984
1 b 87.851712
2 a 795.669208
3 a 954.550734
4 b 34.985337
.. .. ...
95 a 361.697281
96 d 245.245859
97 d 963.222224
98 b 545.422079
99 a 630.812729
[100 rows x 2 columns]
In [51]: df["new"].value_counts()
Out[51]:
a 30
c 25
b 22
d 19
e 4
Name: new, dtype: int64
Solution 1:
res = df.groupby("new").filter(lambda x: len(x) >= 10).groupby("new")["amount"].var()
Solution 2:
res = df.groupby("new")["amount"].agg(["var", "count"]).query("count >= 10")["var"]
Result:
In [53]: res
Out[53]:
new
a 75210.670184
b 84411.914567
c 72483.512171
d 101631.615241
Name: amount, dtype: float64
UPD: it is worth mentioning that in Pandas, the variance is calculated by default with a degree of freedom of 1 (parameter: ddof=1
), whereas in np.var(..., ddof=0)
. Thus, to get the same values in Pandas as in np.var()
, you must explicitly specify ddof=0
:
In [72]: df.loc[df["new"]=="a", "amount"].var()
Out[72]: 75210.67018445666
In [73]: np.var(df.loc[df["new"]=="a", "amount"])
Out[73]: 72703.64784497478
In [74]: df.loc[df["new"]=="a", "amount"].var(ddof=0)
Out[74]: 72703.64784497478
And vice versa, to get the same value in np.var()
as in Pandas:
In [75]: np.var(df.loc[df["new"]=="a", "amount"], ddof=1)
Out[75]: 75210.67018445666
4
Author: MaxU, 2020-08-18 14:17:22