Good day, I have a dataframe with region, customer and some deliveries. There is this column used as type of purchase and the first and last purchase are marked as 'first' and 'last' and sometimes we have in-between deliveries marked as "delivery". I need to flag the customers and region that don't have any in-between deliveries at all, as a column in the desired output. Marking an in-between delivery on a row ain't hard but the whole group customer-region needs to be marked.
import pandas as pd
data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 20],
['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price'])
# print dataframe.
df
Print:
region customer purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY 20
2 NY A DELIVERY 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY 10
9 FL A DELIVERY 12
10 FL A DELIVERY 25
11 FL A LAST 20
12 FL C FIRST 15
13 FL C LAST 10
14 FL D FIRST 10
15 FL D DELIVERY 20
16 FL D LAST 30
17 FL E FIRST 20
18 FL E LAST 20
Desired output:
region customer purchaseType price noDeliveryFlag
0 NY A FIRST 10 0
1 NY A DELIVERY 20 0
2 NY A DELIVERY 30 0
3 NY A LAST 25 0
4 NY B FIRST 15 0
5 NY B DELIVERY 10 0
6 NY B LAST 20 0
7 FL A FIRST 15 0
8 FL A DELIVERY 10 0
9 FL A DELIVERY 12 0
10 FL A DELIVERY 25 0
11 FL A LAST 20 0
12 FL C FIRST 15 1
13 FL C LAST 10 1
14 FL D FIRST 10 0
15 FL D DELIVERY 20 0
16 FL D LAST 30 0
17 FL E FIRST 20 1
18 FL E LAST 20 1
Thank you so much!
I think I figured this out
df['noDeliveryFlag'] = df['purchaseType'] != 'DELIVERY'
df['noDeliveryFlag'] = df.groupby(['region','customer'])['noDeliveryFlag'].transform('min').astype(int)
print(df)
If someone has got a more efficient way I appreciate it.
looks very efficient indeed!