Warm tip: This article is reproduced from serverfault.com, please click

R to create a tally of previous events within a sliding window time period

发布于 2020-11-28 00:14:25

Any one able to assist me with a problem using R to create a sum of previous events in a specific time period please? I apologise if I do not follow protocol this is my first question here.

I have a df of IDs and event dates. In the genuine df the events are date times but to keep things simpler I have used dates here.

I am trying to create a variable which is a tally of the number of previous events within the last 2 years for that ID number (or 750 days as Im not too concerned about leap years but that would be nice to factor in).

There is an additional issue in that some IDs will have more than one event on the same date (and time in the genuine df). I do not want to remove these as in the real df there is an additional variable which is not necessarily the same. However, in the sum I want to count events happening on the same date as one event i.e. if an ID had only 2 events but they were on the same day the result would be 0, or there may be 3 rows of dates within the previous 2 years for an ID - but as two are the same date the result is 2. I have created a outcome vector to give an example of what I have after ID 7 has examples of this.

If there were 3 previous events all on the same day the result sum would be 1 and any subsequent events in 2 years

ID <- c(10,1,11,2,2,13,4,5,6,6,13,7,7,7,8,8,9,9,9,10,1,11,2,11,12,9,13,14,7,15,7)
event.date<-c('2018-09-09','2016-06-02','2018-08-20', '2018-11-03', '2018-07-10', '2017-03-08', '2018-06-16', '2017-05-20', '2016-04-02', '2016-07-27', '2018-07-15', '2018-06-15', '2018-06-15', '2018-01-16', '2017-10-07', '2016-08-17','2018-08-01','2017-01-22','2016-08-05', '2018-08-13', '2016-11-28', '2018-11-24','2016-06-01', '2018-03-26', '2017-02-04', '2017-12-01', '2016-05-16', '2017-11-25', '2018-04-01', '2017-09-21', '2018-04-01')
df<-data.frame(ID,event.date)

df<-df%>%
  arrange(ID,event.date)

The resulting column should look something like this.

event.count <- c(0,1,0,0,1,0,0,0,1,0,1,1,2,2,0,1,0,1,2,3,0,1,0,1,2,0,0,1,1,0,0)
df$event.count<-event.count

I have tried various if else and use of lag() but cannot get what I am after

thank you.

Questioner
Lewis P-M
Viewed
0
Ben 2020-11-30 01:26:12

Here is a solution with data.table.

To subtract 2 years from the event.date, you can use lubridate and subtract years(2).

After grouping by both ID and event.date, you can subset all dates that fall in between 2 years ago and the date (incbounds of between will exclude the upper and lower bounds).

Using uniqueN will prevent duplicate dates from being counted multiple times.

library(data.table)
library(lubridate)

df$event.date <- as.Date(df$event.date)

setDT(df)[, new.event.count := uniqueN(df$event.date[df$ID == ID][
  between(df$event.date[df$ID == ID], 
          event.date - years(2), 
          event.date, 
          incbounds = FALSE)]),
          by = c("ID", "event.date")][]

Output

    ID event.date event.count new.event.count
 1:  1 2016-06-02           0               0
 2:  1 2016-11-28           1               1
 3:  2 2016-06-01           0               0
 4:  2 2018-07-10           0               0
 5:  2 2018-11-03           1               1
 6:  4 2018-06-16           0               0
 7:  5 2017-05-20           0               0
 8:  6 2016-04-02           0               0
 9:  6 2016-07-27           1               1
10:  7 2018-01-16           0               0
11:  7 2018-04-01           1               1
12:  7 2018-04-01           1               1
13:  7 2018-06-15           2               2
14:  7 2018-06-15           2               2
15:  8 2016-08-17           0               0
16:  8 2017-10-07           1               1
17:  9 2016-08-05           0               0
18:  9 2017-01-22           1               1
19:  9 2017-12-01           2               2
20:  9 2018-08-01           3               3
21: 10 2018-08-13           0               0
22: 10 2018-09-09           1               1
23: 11 2018-03-26           0               0
24: 11 2018-08-20           1               1
25: 11 2018-11-24           2               2
26: 12 2017-02-04           0               0
27: 13 2016-05-16           0               0
28: 13 2017-03-08           1               1
29: 13 2018-07-15           1               1
30: 14 2017-11-25           0               0
31: 15 2017-09-21           0               0
    ID event.date event.count new.event.count