Warm tip: This article is reproduced from serverfault.com, please click

sql-如何在bigquery中回填空值?

(sql - How can I back fill null values in bigquery?)

发布于 2020-11-28 22:09:51

我正在尝试在BigQuery中执行空回填,类似于Panda的数据框填充。阅读文档后,该last_value功能似乎是一个不错的选择。但是,这会留下一些null斑点,直到找到第一个值为止(在给定函数名称的情况下,这是非常合理的)。我该如何回填那些null还是我只需要放下它们?

这是一个示例查询:

select table_path.*, last_value(sn_6 ignore nulls) over (order by time)
from (select 1 as time, null as sn_6 union all
      select 2, 1 union all
      select 3, null union all
      select 4, null union all
      select 5, null union all
      select 6, 0 union all
      select 7, null union all
      select 8, null
     ) table_path;

实际输出:

time    sn_6    f0_
1       null   null
2         1     1
3       null    1
4       null    1
5       null    1
6         0     0
7       null    0
8       null    0

所需的输出:

time    sn_6    f0_
1       null    1 <---Back fill all the gaps!
2         1     1
3       null    1
4       null    1
5       null    1
6         0     0
7       null    0
8       null    0

实际数据有一个timestamp列,后跟6float列,并且各处都有空值。

Questioner
Pedro Pablo Severin Honorato
Viewed
11
Yun Zhang 2020-11-29 06:42:51

如果要使丢失的“回填”成为“正向填充”,则可以使用first_value函数来查找第一个非空值,如下所示:

select table_path.*, 
coalesce(
  last_value(sn_6 ignore nulls) over (order by time),
  first_value(sn_6 ignore nulls) over (order by time RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
  )
from (select 1 as time, null as sn_6 union all
      select 2, 1 union all
      select 3, null union all
      select 4, null union all
      select 5, null union all
      select 6, 0 union all
      select 7, null union all
      select 8, null
     ) table_path;