SELECT
first_name + ' ' + last_name AS name,
country,
birthdate,
-- Retrieve the birthdate of the oldest voter per country
FIRST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS oldest_voter,
-- Retrieve the birthdate of the youngest voter per country
LAST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS youngest_voter
FROM voters
WHERE country IN ('Spain', 'USA');
The above query results in the following data:
name country birthdate oldest_vote youngest_voter
Caroline Griffin Spain 1981-03-20 1981-03-20 1988-03-21
Christopher Jackson Spain 1981-04-15 1981-03-20 1988-03-21
Raul Raji Spain 1981-04-25 1981-03-20 1988-03-21
Karen Cai Spain 1981-05-03 1981-03-20 1988-03-21
If we remove the window function clause (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) of the "LAST_VALUE(birthdate)" the result changes as below:
SELECT
first_name + ' ' + last_name AS name,
country,
birthdate,
-- Retrieve the birthdate of the oldest voter per country
FIRST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS oldest_voter,
-- Retrieve the birthdate of the youngest voter per country
LAST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS youngest_voter
FROM voters
WHERE country IN ('Spain', 'USA');
name country birthdate oldest_voter youngest_voter
Caroline Griffin Spain 1981-03-20 1981-03-20 1981-03-20
Christopher Jackson Spain 1981-04-15 1981-03-20 1981-04-15
Raul Raji Spain 1981-04-25 1981-03-20 1981-04-25
Karen Cai Spain 1981-05-03 1981-03-20 1981-05-03
The question is
Last_Value (and First_Value) are somewhat strange because they're analytic functions.
Analytic functions deal with windows differently than normal aggregate functions do.
To demonstrate this, I'll take a detour and use a running total using SUM as a first example of the difference between aggregate functions and analytic functions.
Say you have the following table
id num_items
1 5
2 8
3 3
4 5
If you then ran SELECT SUM(num_items) AS Total FROM mytable
the result is 21, as expected. This is the typical 'aggregate' version of the SUM function.
However, it you add ORDER BY to the SUM, it becomes an analytic function.
Running SELECT SUM(Num_items) OVER (ORDER BY id) AS Total FROM mytable;
gives you the following - a running total.
Total
5
13
16
21
With analytic functions, window functions operate on the data to the current row only unless specified otherwise with the ROWS BETWEEN clause.
Now, in your example (birthdates) without the ROWS BETWEEN clause, we can run through the processing.
Let's take the first row to start.
Let's take the second row
Only at the last row will the results be as you expect. For first_value, it is typically not a problem (as you have demonstrated) but it is a 'gotcha!' for last_value.
UPDATE: To overcome the issue, instead of specifying the ROWS BETWEEN component, you can sort it the other way and use First_Value e.g.,
LAST_VALUE(birthdate) OVER (PARTITION BY country ORDER BY birthdate) AS youngest_voter
FIRST_VALUE(birthdate) OVER (PARTITION BY country ORDER BY birthdate DESC) AS youngest_voter