Warm tip: This article is reproduced from serverfault.com, please click

Pandas code to get the count of each values

发布于 2020-12-01 06:17:48

Here I'm sharing a sample data(I'm dealing with Big Data), the "counts" value varies from 1 to 3000+,, sometimes more than that..

Sample data looks like :

          ID                                            counts
41 44 17 16 19 52                                          6

17 30 16 19                                                4

52 41 44 30 17 16                                          6

41 44 52 41 41 41                                          6 

17 17 17 17 41                                             5

I was trying to split "ID" column into multiple & trying to get that count,,

  data= reading the csv_file
 split_data = data.ID.apply(lambda x: pd.Series(str(x).split(" "))) # separating columns

as I mentioned, I'm dealing with big data,, so this method is not that much effective..i'm facing problem to get the "ID" counts

I want to collect the total counts of each ID & map it to the corresponding ID column.

Expected output:

          ID                  counts   16     17     19     30     41     44     52   
41 41 17 16 19 52               6       1     1      1      0      2       0     1

17 30 16 19                     4       1     1      1      1      0       0     0  

52 41 44 30 17 16               6       1     1      0      1      1       1     1

41 44 52 41 41 41               6       0     0      0      0      4       1     1

17 17 17 17 41                  5       0     4      0      0      1       0     0

If you have any idea,, please let me know

Thank you

Questioner
dev_user
Viewed
0
jezrael 2020-12-01 14:35:11

Use Counter for get counts of values splitted by space in list comprehension:

from collections import Counter

L = [{int(k): v for k, v in Counter(x.split()).items()} for x in df['ID']]
df1 = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1)
df = df.join(df1)
print (df)
                  ID  counts  16  17  19  30  41  44  52
0  41 44 17 16 19 52       6   1   1   1   0   1   1   1
1        17 30 16 19       4   1   1   1   1   0   0   0
2  52 41 44 30 17 16       6   1   1   0   1   1   1   1
3  41 44 52 41 41 41       6   0   0   0   0   4   1   1
4     17 17 17 17 41       5   0   4   0   0   1   0   0

Another idea, but I guess slowier:

df1 = df.assign(a = df['ID'].str.split()).explode('a')
df1 = df.join(pd.crosstab(df1['ID'], df1['a']), on='ID')
print (df1)
                  ID  counts  16  17  19  30  41  44  52
0  41 44 17 16 19 52       6   1   1   1   0   1   1   1
1        17 30 16 19       4   1   1   1   1   0   0   0
2  52 41 44 30 17 16       6   1   1   0   1   1   1   1
3  41 44 52 41 41 41       6   0   0   0   0   4   1   1
4     17 17 17 17 41       5   0   4   0   0   1   0   0