Warm tip: This article is reproduced from serverfault.com, please click

Display a column when a desired value is missing while grouping in Pandas dataframe

发布于 2020-11-27 23:26:27

Good day, I have a dataframe with region, customer and some deliveries. There is this column used as type of purchase and the first and last purchase are marked as 'first' and 'last' and sometimes we have in-between deliveries marked as "delivery". I need to flag the customers and region that don't have any in-between deliveries at all, as a column in the desired output. Marking an in-between delivery on a row ain't hard but the whole group customer-region needs to be marked.

    import pandas as pd  
    data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
           ['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
           ['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 20],
           ['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
           ['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
           ['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
           ] 
      
    # Create the pandas DataFrame 
    df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price']) 
      
    # print dataframe. 
    df

Print:

   region customer purchaseType  price
0      NY        A        FIRST     10
1      NY        A     DELIVERY     20
2      NY        A     DELIVERY     30
3      NY        A         LAST     25
4      NY        B        FIRST     15
5      NY        B     DELIVERY     10
6      NY        B         LAST     20
7      FL        A        FIRST     15
8      FL        A     DELIVERY     10
9      FL        A     DELIVERY     12
10     FL        A     DELIVERY     25
11     FL        A         LAST     20
12     FL        C        FIRST     15
13     FL        C         LAST     10
14     FL        D        FIRST     10
15     FL        D     DELIVERY     20
16     FL        D         LAST     30
17     FL        E        FIRST     20
18     FL        E         LAST     20

Desired output:

   region customer purchaseType  price noDeliveryFlag
0      NY        A        FIRST     10              0
1      NY        A     DELIVERY     20              0
2      NY        A     DELIVERY     30              0
3      NY        A         LAST     25              0
4      NY        B        FIRST     15              0
5      NY        B     DELIVERY     10              0
6      NY        B         LAST     20              0
7      FL        A        FIRST     15              0
8      FL        A     DELIVERY     10              0
9      FL        A     DELIVERY     12              0
10     FL        A     DELIVERY     25              0
11     FL        A         LAST     20              0
12     FL        C        FIRST     15              1
13     FL        C         LAST     10              1
14     FL        D        FIRST     10              0
15     FL        D     DELIVERY     20              0
16     FL        D         LAST     30              0
17     FL        E        FIRST     20              1
18     FL        E         LAST     20              1

Thank you so much!

Questioner
Mauro Del Nook
Viewed
0
Mauro Del Nook 2020-11-28 08:02:46

I think I figured this out

df['noDeliveryFlag'] = df['purchaseType'] != 'DELIVERY'
df['noDeliveryFlag'] = df.groupby(['region','customer'])['noDeliveryFlag'].transform('min').astype(int)
print(df)

If someone has got a more efficient way I appreciate it.