Pandas code to get the count of each values

发布于 2020-12-01 06:17:48

Here I'm sharing a sample data(I'm dealing with Big Data), the "counts" value varies from 1 to 3000+,, sometimes more than that..

Sample data looks like :

          ID                                            counts
41 44 17 16 19 52                                          6

17 30 16 19                                                4

52 41 44 30 17 16                                          6

41 44 52 41 41 41                                          6 

17 17 17 17 41                                             5

I was trying to split "ID" column into multiple & trying to get that count,,

  data= reading the csv_file
 split_data = data.ID.apply(lambda x: pd.Series(str(x).split(" "))) # separating columns

as I mentioned, I'm dealing with big data,, so this method is not that much effective..i'm facing problem to get the "ID" counts

I want to collect the total counts of each ID & map it to the corresponding ID column.

Expected output:

          ID                  counts   16     17     19     30     41     44     52   
41 41 17 16 19 52               6       1     1      1      0      2       0     1

17 30 16 19                     4       1     1      1      1      0       0     0  

52 41 44 30 17 16               6       1     1      0      1      1       1     1

41 44 52 41 41 41               6       0     0      0      0      4       1     1

17 17 17 17 41                  5       0     4      0      0      1       0     0

If you have any idea,, please let me know

Thank you

Questioner

dev_user

Viewed

Original

from collections import Counter L = [{int(k): v for k, v in Counter(x.split()).items()} for x in df['ID']] df1 = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1) df = df.join(df1) print (df) ID counts 16 17 19 30 41 44 52 0 41 44 17 16 19 52 6 1 1 1 0 1 1 1 1 17 30 16 19 4 1 1 1 1 0 0 0 2 52 41 44 30 17 16 6 1 1 0 1 1 1 1 3 41 44 52 41 41 41 6 0 0 0 0 4 1 1 4 17 17 17 17 41 5 0 4 0 0 1 0 0

df1 = df.assign(a = df['ID'].str.split()).explode('a') df1 = df.join(pd.crosstab(df1['ID'], df1['a']), on='ID') print (df1) ID counts 16 17 19 30 41 44 52 0 41 44 17 16 19 52 6 1 1 1 0 1 1 1 1 17 30 16 19 4 1 1 1 1 0 0 0 2 52 41 44 30 17 16 6 1 1 0 1 1 1 1 3 41 44 52 41 41 41 6 0 0 0 0 4 1 1 4 17 17 17 17 41 5 0 4 0 0 1 0 0

Pandas code to get the count of each values

热门github