早上好,我有一个包含区域,客户和某些交货的数据框。有作为本专栏购买的类型以及第一和最后购买被标记为“第一”和“最后”,有时我们在两者之间交付标记为“交货”。我需要标志的客户和区域是没有任何在两者之间,在所有交付,如所期望的输出列。连续标记中间交付并不难,但是需要标记整个客户群。
import pandas as pd
data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 20],
['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price'])
# print dataframe.
df
打印:
region customer purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY 20
2 NY A DELIVERY 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY 10
9 FL A DELIVERY 12
10 FL A DELIVERY 25
11 FL A LAST 20
12 FL C FIRST 15
13 FL C LAST 10
14 FL D FIRST 10
15 FL D DELIVERY 20
16 FL D LAST 30
17 FL E FIRST 20
18 FL E LAST 20
所需的输出:
region customer purchaseType price noDeliveryFlag
0 NY A FIRST 10 0
1 NY A DELIVERY 20 0
2 NY A DELIVERY 30 0
3 NY A LAST 25 0
4 NY B FIRST 15 0
5 NY B DELIVERY 10 0
6 NY B LAST 20 0
7 FL A FIRST 15 0
8 FL A DELIVERY 10 0
9 FL A DELIVERY 12 0
10 FL A DELIVERY 25 0
11 FL A LAST 20 0
12 FL C FIRST 15 1
13 FL C LAST 10 1
14 FL D FIRST 10 0
15 FL D DELIVERY 20 0
16 FL D LAST 30 0
17 FL E FIRST 20 1
18 FL E LAST 20 1
非常感谢!
我想我明白了
df['noDeliveryFlag'] = df['purchaseType'] != 'DELIVERY'
df['noDeliveryFlag'] = df.groupby(['region','customer'])['noDeliveryFlag'].transform('min').astype(int)
print(df)
如果有人有更有效的方法,我将不胜感激。
看起来确实非常有效!