Warm tip: This article is reproduced from serverfault.com, please click

sql-每天唯一的累积客户

(sql - Unique cumulative customers by each day)

发布于 2020-12-01 23:43:33

任务:按每个拒绝原因和每天获取唯一的累积客户总数。

    Input data sample:
    +---------+--------------+------------+------+
    | Cust_Id |  Decline_Dt  |  Reason    | Days |
    +---------+--------------+------------+------+
    |   A     |  08-09-2020  |  Reason_1  |   0  |
    |   A     |  08-09-2020  |  Reason_1  |   1  | 
    |   A     |  08-09-2020  |  Reason_1  |   2  |
    |   A     |  08-09-2020  |  Reason_1  |   4  |
    |   B     |  08-09-2020  |  Reason_1  |   0  |
    |   B     |  08-09-2020  |  Reason_1  |   2  |
    |   B     |  08-09-2020  |  Reason_1  |   3  |
    |   C     |  08-09-2020  |  Reason_1  |   1  |
    +---------+--------------+------------+------+   
1) Decline_dt - The date on which the payment was declined. (Ignore it for this task)
2) Days - Indicates the # of days after the payment decline happened, the customer interacted with IVR channel. 
3) Reason - Indicates the payment decline reason 
    
    --Expected Output:
    +---------------+-----------+---------------+----------------------------+
    |   Reason      |   Days    | Unique_mtns   | total_cumulative_customers |
    +---------------+-----------+---------------+----------------------------+
    |   Reason_1    |   0       |   2           |           2                |              
    |   Reason_1    |   1       |   2           |           3                |
    |   Reason_1    |   2       |   2           |           3                | 
    |   Reason_1    |   3       |   1           |           3                | 
    |   Reason_1    |   4       |   1           |           3                | 
    +------------------------------------------------------------------------+

我的Hive查询:

select a.Reason
        , a.days
        -- , count(distinct a.cust_id) as unique_mtns
        , count(distinct a.cust_id) over (partition by Reason 
                                                order by a.days rows between unbounded preceding and current row) 
                                        as total_cumulative_customers
from table as a  
group by a.reason
        , a.days

输出(不正确):

+---------------+-----------+----------------------------+
|   Reason      |   Days    | total_cumulative_customers |
+---------------+-----------+----------------------------+
|   Reason_1    |   0       |               2            |              
|   Reason_1    |   1       |               2            |
|   Reason_1    |   2       |               2            | 
|   Reason_1    |   3       |               1            | 
|   Reason_1    |   4       |               1            | 
+--------------------------------------------------------+

理想情况下,我希望不使用group by来执行window函数。但是,如果没有分组依据,则会出现错误。使用分组依据时,我没有获得累积的客户。

Questioner
Ashish
Viewed
0
GMB 2020-12-02 15:58:05

如果我正确地遵循了你的说明,则可以使用子查询来计算每个客户/原因元组的第一天,然后进行条件汇总:

select reason, days, 
    count(distinct cust_id) as unique_mtns,
    sum(sum(case when days = min_days then 1 else 0 end)) 
        over(partition by reason order by days) as total_cumulative_customers
from (
    select reason, cust_id, 
        min(days) over(partition by reason, cust_id) as min_days
    from mytable
) t
group by reason, days