我想使用使用 pandas 的模式为性别列填充nan,但是我的方法不起作用
# change gender to string datatype
df['gender'] = df['gender'].map(str)
# Replace empty gender(73) with there most common gender
mode = df['gender'].mode()
df['gender'].fillna(mode, inplace=True)
df['gender'].value_counts()
输出
M 4417
F 1504
南73
名称:性别,dtype:int64
用数据测试:
df = pd.read_pickle('gender.pkl')
print (df)
gender
0 M
1 M
2 M
3 M
4 M
...
114746 M
114747 M
114748 M
114749 M
114750 F
print (df['gender'].isna().sum())
785
print (df['gender'].value_counts())
M 85893
F 28073
Name: gender, dtype: int64
你需要选择mode
by的第一个值Series.iat
:
mode = df['gender'].mode().iat[0]
df['gender'].fillna(mode, inplace=True)
print (df['gender'].isna().sum())
0
print (df['gender'].value_counts())
M 86678
F 28073
Name: gender, dtype: int64
删除
df['gender'] = df['gender'].map(str)
并不能解决问题。它掩盖了它。当我73岁df["gender"].isnull().sum()
以后做的时候df['gender'] = df['gender'].fillna(mode)
@ShadowWalker -一个想法-如何工作
df['gender'] = df['gender'].replace(['nan', 'NaN'], np.nan)
,而不是df['gender'] = df['gender'].map(str)
?没有变化,仍然为空值
@ShadowWalker-答案已被编辑。需要
.iat[0]