SQL’s CASE WHEN
statement combined with the COUNT
function is a great for data analysis. This combo lets you create custom categories and count occurrences within those categories in a single query.
If you’ve ever found yourself running multiple queries to get counts for different conditions, this technique will save you time and simplify your code.
The CASE WHEN COUNT
combination shines when you need to:
Let’s dive into some practical examples to see how this works in action.
Before we jump into complex scenarios, let’s review the basic syntax:
SELECT
COUNT(CASE WHEN condition THEN result END) AS count_name
FROM
table_name;
This structure allows you to count rows that meet specific conditions. The CASE WHEN
statement acts as a filter, and COUNT
tallies up the matches.
Imagine you’re working with an e-commerce database and want to analyze customer orders. You have a table called orders
with columns like order_id
, customer_id
, order_date
, and total_amount
.
Let’s say you want to count orders in different value ranges: under $50, $50-$100, and over $100.
SELECT
COUNT(CASE WHEN total_amount < 50 THEN 1 END) AS low_value_orders,
COUNT(CASE WHEN total_amount BETWEEN 50 AND 100 THEN 1 END) AS mid_value_orders,
COUNT(CASE WHEN total_amount > 100 THEN 1 END) AS high_value_orders
FROM
orders;
This query gives you a quick overview of order distribution across different value ranges. It’s much more efficient than running separate queries for each range.
Now, let’s add a time dimension to our analysis. We’ll count orders for the last 30 days, 30-60 days ago, and 60-90 days ago.
SELECT
COUNT(CASE WHEN order_date >= CURRENT_DATE - INTERVAL '30 days' THEN 1 END) AS last_30_days,
COUNT(CASE WHEN order_date BETWEEN CURRENT_DATE - INTERVAL '60 days' AND CURRENT_DATE - INTERVAL '30 days' THEN 1 END) AS days_30_to_60,
COUNT(CASE WHEN order_date BETWEEN CURRENT_DATE - INTERVAL '90 days' AND CURRENT_DATE - INTERVAL '60 days' THEN 1 END) AS days_60_to_90
FROM
orders;
This query provides insights into recent order trends without the need for multiple date-range filters.
The real power of CASE WHEN COUNT
comes into play when you combine multiple conditions and aggregations.
Let’s identify and count high-value customers based on their order history.
SELECT
COUNT(CASE WHEN total_orders > 10 AND total_spend > 1000 THEN 1 END) AS high_value_customers,
COUNT(CASE WHEN total_orders > 5 AND total_spend > 500 THEN 1 END) AS potential_high_value
FROM (
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_spend
FROM
orders
GROUP BY
customer_id
) customer_summary;
This query uses a subquery to calculate total orders and spend per customer, then applies CASE WHEN COUNT
to categorize and count customers based on these metrics.
Another practical use of CASE WHEN COUNT
is calculating percentages within your dataset.
Suppose you have a products
table with a category
column. You want to see the distribution of products across categories as percentages.
SELECT
category,
COUNT(*) AS category_count,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS category_percentage
FROM
products
GROUP BY
category;
This query uses COUNT(*)
with CASE WHEN
implicitly (the CASE WHEN
is applied by the GROUP BY
). The OVER ()
clause creates a window function that calculates the total count across all categories.
You can mix CASE WHEN COUNT
with other aggregation functions for more comprehensive analysis.
Let’s say you have a customers
table with a customer_type
column (e.g., ‘retail’, ‘wholesale’, ‘VIP’). You want to analyze both the count and total value of orders for each customer type.
SELECT
c.customer_type,
COUNT(CASE WHEN o.total_amount < 100 THEN 1 END) AS low_value_orders,
COUNT(CASE WHEN o.total_amount >= 100 AND o.total_amount < 500 THEN 1 END) AS mid_value_orders,
COUNT(CASE WHEN o.total_amount >= 500 THEN 1 END) AS high_value_orders,
SUM(CASE WHEN o.total_amount < 100 THEN o.total_amount ELSE 0 END) AS low_value_total,
SUM(CASE WHEN o.total_amount >= 100 AND o.total_amount < 500 THEN o.total_amount ELSE 0 END) AS mid_value_total,
SUM(CASE WHEN o.total_amount >= 500 THEN o.total_amount ELSE 0 END) AS high_value_total
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_type;
This query provides a detailed breakdown of order counts and values across different customer types and order value ranges.
While CASE WHEN COUNT
is incredibly useful, it’s important to consider performance, especially when working with large datasets.
CASE WHEN
conditions are properly indexed.CASE WHEN
statements.COUNT(expression)
only counts non-null values. If your CASE WHEN
might return null, use COUNT(*)
instead: COUNT(CASE WHEN condition THEN 1 ELSE NULL END)
vs.
COUNT(*) FILTER (WHERE condition)
CASE WHEN
conditions are mutually exclusive if you’re categorizing data. Overlapping conditions can lead to double-counting.The CASE WHEN COUNT
technique in SQL is a versatile tool for data analysis. It allows you to create nuanced, condition-based counts in a single query, reducing the need for multiple queries or complex joins. By using this approach, you’ll be able to write more efficient, readable SQL code and derive insights from your data more quickly.
Remember, the key to effective use of CASE WHEN COUNT
is to think about your data categorization needs upfront. What conditions are important for your analysis? How can you group these conditions efficiently? With practice, you’ll find yourself reaching for this technique regularly in your SQL toolkit.