SELECT
first_name + ' ' + last_name AS name,
country,
birthdate,
-- Retrieve the birthdate of the oldest voter per country
FIRST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS oldest_voter,
-- Retrieve the birthdate of the youngest voter per country
LAST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS youngest_voter
FROM voters
WHERE country IN ('Spain', 'USA');
上面的查询产生以下数据:
name country birthdate oldest_vote youngest_voter
Caroline Griffin Spain 1981-03-20 1981-03-20 1988-03-21
Christopher Jackson Spain 1981-04-15 1981-03-20 1988-03-21
Raul Raji Spain 1981-04-25 1981-03-20 1988-03-21
Karen Cai Spain 1981-05-03 1981-03-20 1988-03-21
如果我们删除“ LAST_VALUE(birthdate)”的窗口函数子句(未绑定的前导和未绑定的行之间的行),结果将发生如下变化:
SELECT
first_name + ' ' + last_name AS name,
country,
birthdate,
-- Retrieve the birthdate of the oldest voter per country
FIRST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS oldest_voter,
-- Retrieve the birthdate of the youngest voter per country
LAST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS youngest_voter
FROM voters
WHERE country IN ('Spain', 'USA');
name country birthdate oldest_voter youngest_voter
Caroline Griffin Spain 1981-03-20 1981-03-20 1981-03-20
Christopher Jackson Spain 1981-04-15 1981-03-20 1981-04-15
Raul Raji Spain 1981-04-25 1981-03-20 1981-04-25
Karen Cai Spain 1981-05-03 1981-03-20 1981-05-03
问题是
Last_Value(和First_Value)有些奇怪,因为它们是解析函数。
解析函数对窗口的处理方式与常规聚合函数不同。
为了说明这一点,我将绕道而行,并使用以SUM表示的运行总计作为聚合函数和解析函数之间差异的第一个示例。
说你有下表
id num_items
1 5
2 8
3 3
4 5
如果你随后运行,SELECT SUM(num_items) AS Total FROM mytable
则结果为21,与预期的一样。这是SUM函数的典型“聚合”版本。
但是,你将ORDER BY添加到SUM,它将成为一个分析函数。
运行SELECT SUM(Num_items) OVER (ORDER BY id) AS Total FROM mytable;
为你提供以下内容-运行总计。
Total
5
13
16
21
使用分析功能时,除非使用ROWS BETWEEN子句另外指定,否则窗口功能仅对当前行的数据进行操作。
现在,在你的示例(出生日期)中没有ROWS BETWEEN子句的情况下,我们可以进行整个处理。
让我们从第一行开始。
让我们走第二排
只有在最后一行,结果才能达到你的期望。对于first_value来说,这通常不是问题(如你所展示的),但这是一个“陷阱”!为last_value。
更新:要解决此问题,你可以指定其他方式而不是使用ROWS BETWEEN组件,而是使用First_Value进行排序,例如,
LAST_VALUE(birthdate) OVER (PARTITION BY country ORDER BY birthdate) AS youngest_voter
FIRST_VALUE(birthdate) OVER (PARTITION BY country ORDER BY birthdate DESC) AS youngest_voter