Warm tip: This article is reproduced from serverfault.com, please click

Unique cumulative customers by each day

发布于 2020-12-01 23:43:33

Task: Get the total number of unique cumulative customers by each decline reason and by each day.

    Input data sample:
    +---------+--------------+------------+------+
    | Cust_Id |  Decline_Dt  |  Reason    | Days |
    +---------+--------------+------------+------+
    |   A     |  08-09-2020  |  Reason_1  |   0  |
    |   A     |  08-09-2020  |  Reason_1  |   1  | 
    |   A     |  08-09-2020  |  Reason_1  |   2  |
    |   A     |  08-09-2020  |  Reason_1  |   4  |
    |   B     |  08-09-2020  |  Reason_1  |   0  |
    |   B     |  08-09-2020  |  Reason_1  |   2  |
    |   B     |  08-09-2020  |  Reason_1  |   3  |
    |   C     |  08-09-2020  |  Reason_1  |   1  |
    +---------+--------------+------------+------+   
1) Decline_dt - The date on which the payment was declined. (Ignore it for this task)
2) Days - Indicates the # of days after the payment decline happened, the customer interacted with IVR channel. 
3) Reason - Indicates the payment decline reason 
    
    --Expected Output:
    +---------------+-----------+---------------+----------------------------+
    |   Reason      |   Days    | Unique_mtns   | total_cumulative_customers |
    +---------------+-----------+---------------+----------------------------+
    |   Reason_1    |   0       |   2           |           2                |              
    |   Reason_1    |   1       |   2           |           3                |
    |   Reason_1    |   2       |   2           |           3                | 
    |   Reason_1    |   3       |   1           |           3                | 
    |   Reason_1    |   4       |   1           |           3                | 
    +------------------------------------------------------------------------+

My Hive query:

select a.Reason
        , a.days
        -- , count(distinct a.cust_id) as unique_mtns
        , count(distinct a.cust_id) over (partition by Reason 
                                                order by a.days rows between unbounded preceding and current row) 
                                        as total_cumulative_customers
from table as a  
group by a.reason
        , a.days

Output (Incorrect):

+---------------+-----------+----------------------------+
|   Reason      |   Days    | total_cumulative_customers |
+---------------+-----------+----------------------------+
|   Reason_1    |   0       |               2            |              
|   Reason_1    |   1       |               2            |
|   Reason_1    |   2       |               2            | 
|   Reason_1    |   3       |               1            | 
|   Reason_1    |   4       |               1            | 
+--------------------------------------------------------+

Ideally, I would expect the window function to be executed without group by. However, I get an error without group by. When I use group by, I don't get the cumulative customers.

Questioner
Ashish
Viewed
0
GMB 2020-12-02 15:58:05

If I follow you correctly, you can use a subquery to compute the first day per customer/reason tuple, and then do conditional aggregation:

select reason, days, 
    count(distinct cust_id) as unique_mtns,
    sum(sum(case when days = min_days then 1 else 0 end)) 
        over(partition by reason order by days) as total_cumulative_customers
from (
    select reason, cust_id, 
        min(days) over(partition by reason, cust_id) as min_days
    from mytable
) t
group by reason, days