The SQL FIRST_VALUE function is a window function that returns the first value in an ordered set of values.
It’s a tool for data analysis, allowing you to retrieve specific values from a group of rows based on a particular ordering.
Let’s dive into how FIRST_VALUE works and explore its practical applications.
The basic syntax of FIRST_VALUE is as follows:
FIRST_VALUE(expression) OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC | DESC], ...
)
Here’s what each part means:
expression
: The column or expression whose first value you want to retrieve.PARTITION BY
: Optional. Divides the result set into partitions to which the function is applied.ORDER BY
: Specifies the order of rows within each partition.Let’s start with a simple example:
SELECT
product_name,
category,
price,
FIRST_VALUE(price) OVER (
ORDER BY price
) AS lowest_price
FROM products;
This query returns each product along with the lowest price across all products. The FIRST_VALUE function, ordered by price, will always return the lowest price.
Now, let’s use PARTITION BY to get more granular results:
SELECT
product_name,
category,
price,
FIRST_VALUE(price) OVER (
PARTITION BY category
ORDER BY price
) AS lowest_category_price
FROM products;
This query returns the lowest price within each category for every product.
Let’s apply FIRST_VALUE to a more complex scenario. Imagine you’re analyzing sales data and want to compare each sale to the first sale of the day for each salesperson.
WITH daily_sales AS (
SELECT
sale_id,
salesperson_id,
sale_amount,
sale_datetime,
DATE(sale_datetime) AS sale_date
FROM sales
)
SELECT
ds.sale_id,
ds.salesperson_id,
ds.sale_amount,
ds.sale_datetime,
FIRST_VALUE(ds.sale_amount) OVER (
PARTITION BY ds.salesperson_id, ds.sale_date
ORDER BY ds.sale_datetime
) AS first_sale_amount,
ds.sale_amount - FIRST_VALUE(ds.sale_amount) OVER (
PARTITION BY ds.salesperson_id, ds.sale_date
ORDER BY ds.sale_datetime
) AS difference_from_first_sale
FROM daily_sales ds
ORDER BY ds.salesperson_id, ds.sale_datetime;
This query does several things:
This analysis could help identify patterns in sales performance throughout the day.
You might wonder why we use FIRST_VALUE when we could use MIN to find the lowest value. Here’s an example that illustrates the difference:
SELECT
product_name,
category,
price,
FIRST_VALUE(product_name) OVER (
PARTITION BY category
ORDER BY price
) AS cheapest_product,
MIN(price) OVER (
PARTITION BY category
) AS lowest_price
FROM products;
In this query, FIRST_VALUE allows us to get the name of the cheapest product in each category, while MIN only gives us the lowest price. FIRST_VALUE is more flexible, allowing you to return any column value based on the ordering of another column.
SQL FIRST_VALUE with a Rapid Database Builder
When working with SQL, executing efficient queries is straightforward, but sometimes more complex operations, like retrieving the first value in a partitioned dataset using the FIRST_VALUE
function, require extra attention.
This is where rapid database builders like Five come into play. Five provides a MySQL database for your application and generates an automatic UI, making it easier to interact with and manage your data.
Five allows you to create interactive forms, tables, dynamic charts, comprehensive PDF reports, and dashboards that are automatically generated based on your database schema. This makes it easier to visualize and analyze the results of queries, including those utilizing advanced window functions like FIRST_VALUE
.
Additionally, Five lets you write custom JavaScript and TypeScript functions, providing the flexibility to implement custom business logic that can interact with your data. This means you can incorporate SQL features like FIRST_VALUE
into your application logic.
Once your application is ready, Five makes deployment easy with just a few clicks, allowing you to deploy your MySQL-based application to a secure, scalable cloud infrastructure. This lets you focus on development while Five handles the complexities of cloud deployment and infrastructure management.
If you’re serious about using SQL efficiently, give Five a try.
FIRST_VALUE can behave unexpectedly with NULL values. By default, NULL values are considered the lowest in the ordering. Here’s how to handle them:
SELECT
product_name,
category,
price,
FIRST_VALUE(product_name) OVER (
PARTITION BY category
ORDER BY
CASE WHEN price IS NULL THEN 1 ELSE 0 END,
price
) AS cheapest_product
FROM products;
This query ensures that NULL prices are considered last in the ordering.
Some SQL implementations support the IGNORE NULLS option with FIRST_VALUE:
SELECT
product_name,
category,
price,
FIRST_VALUE(price IGNORE NULLS) OVER (
PARTITION BY category
ORDER BY price
) AS lowest_non_null_price
FROM products;
This query returns the lowest non-NULL price for each category.
FIRST_VALUE becomes even more useful when combined with other window functions. Here’s an example that ranks products within their category and shows the price difference from the cheapest product:
SELECT
product_name,
category,
price,
FIRST_VALUE(price) OVER (
PARTITION BY category
ORDER BY price
) AS lowest_category_price,
price - FIRST_VALUE(price) OVER (
PARTITION BY category
ORDER BY price
) AS price_difference,
RANK() OVER (
PARTITION BY category
ORDER BY price
) AS price_rank
FROM products;
This query provides a comprehensive view of product pricing within each category.
FIRST_VALUE is particularly useful for time series analysis. Let’s look at an example involving stock prices:
WITH daily_stocks AS (
SELECT
stock_symbol,
DATE(price_datetime) AS price_date,
closing_price
FROM stock_prices
)
SELECT
ds.stock_symbol,
ds.price_date,
ds.closing_price,
FIRST_VALUE(ds.closing_price) OVER (
PARTITION BY ds.stock_symbol
ORDER BY ds.price_date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) AS price_30_days_ago,
ds.closing_price - FIRST_VALUE(ds.closing_price) OVER (
PARTITION BY ds.stock_symbol
ORDER BY ds.price_date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) AS price_change_30_days
FROM daily_stocks ds
ORDER BY ds.stock_symbol, ds.price_date;
This query calculates the price change over a 30-day rolling window for each stock. The ROWS BETWEEN clause defines the window frame, allowing us to look back 30 days from each current row.
While FIRST_VALUE is a useful function, it can impact query performance, especially on large datasets. Here are some tips to optimize its use:
CREATE MATERIALIZED VIEW product_price_analysis AS
SELECT
product_name,
category,
price,
FIRST_VALUE(price) OVER (
PARTITION BY category
ORDER BY price
) AS lowest_category_price
FROM products;
FIRST_VALUE(price) OVER (
PARTITION BY category
ORDER BY sale_date
ROWS BETWEEN 365 PRECEDING AND CURRENT ROW
) AS lowest_price_last_year
The SQL FIRST_VALUE function is a versatile tool for data analysis. It allows you to retrieve specific values from ordered sets, enabling complex comparisons and time-series analysis.
By understanding how to use FIRST_VALUE effectively, you can unlock new insights from your data and enhance your SQL queries.
Key takeaways: