SQL Server Partitioning: Practical Guide

Ryan Forrester

Sep 27th, 2024

Blog

SQL Server partitioning is a great for managing large databases and boosting query performance.

In this guide, we’ll dive into the nuts and bolts of partitioning, explore its benefits, and walk through real-world examples to help you implement this technique effectively.

What is SQL Server Partitioning?

At its core, SQL Server partitioning is about dividing large tables and indexes into smaller, more manageable pieces. Instead of storing all data in a single unit, partitioning spreads it across multiple filegroups based on a partitioning scheme.

Let’s look at a simple example:

-- Create a partition function
CREATE PARTITION FUNCTION [PF_OrderDate](datetime) AS RANGE RIGHT FOR VALUES
('2022-01-01', '2023-01-01', '2024-01-01')

-- Create a partition scheme
CREATE PARTITION SCHEME [PS_OrderDate] AS PARTITION [PF_OrderDate] 
TO ([FG2022], [FG2023], [FG2024], [FG2025])

-- Create a partitioned table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderDate DATETIME,
    CustomerID INT,
    TotalAmount DECIMAL(10,2)
) ON [PS_OrderDate](OrderDate)

-- Create a partition function
CREATE PARTITION FUNCTION [PF_OrderDate](datetime) AS RANGE RIGHT FOR VALUES
('2022-01-01', '2023-01-01', '2024-01-01')

-- Create a partition scheme
CREATE PARTITION SCHEME [PS_OrderDate] AS PARTITION [PF_OrderDate] 
TO ([FG2022], [FG2023], [FG2024], [FG2025])

-- Create a partitioned table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderDate DATETIME,
    CustomerID INT,
    TotalAmount DECIMAL(10,2)
) ON [PS_OrderDate](OrderDate)

In this example, we’ve created a partition function that splits data based on the OrderDate. The partition scheme then maps these logical partitions to physical filegroups. Finally, we create the Orders table using this partitioning scheme.

Why Use SQL Server Partitioning?

Improved Query Performance: By dividing data, SQL Server can quickly eliminate irrelevant partitions during query execution, significantly reducing I/O operations.
Easier Data Management: Need to archive old data? With partitioning, you can efficiently move or delete entire partitions without affecting the rest of the table.
Faster Data Loading: Bulk insert operations can be parallelized across partitions, speeding up data ingestion.
Enhanced Availability: Partitions can be backed up and restored independently, allowing for more flexible maintenance strategies.

Real-World Scenario: E-commerce Order Management

Let’s consider an e-commerce platform that processes millions of orders annually. As the business grows, querying and managing the ever-expanding Orders table becomes increasingly challenging.

Here’s how we might implement partitioning for this scenario:

-- Create filegroups for each year
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2022]
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2023]
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2024]
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2025]

-- Add files to filegroups
ALTER DATABASE ECommerceDB ADD FILE 
( NAME = N'OrderData2022', FILENAME = N'C:\Data\OrderData2022.ndf' )
TO FILEGROUP [FG2022]

-- Repeat for other years...

-- Create partition function and scheme (as shown earlier)

-- Create the partitioned Orders table
CREATE TABLE Orders (
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    OrderDate DATETIME NOT NULL,
    CustomerID INT NOT NULL,
    TotalAmount DECIMAL(10,2) NOT NULL,
    Status VARCHAR(20) NOT NULL
) ON [PS_OrderDate](OrderDate)

-- Create a non-clustered index on CustomerID
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID ON Orders(CustomerID)
ON [PS_OrderDate](OrderDate)

-- Create filegroups for each year
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2022]
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2023]
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2024]
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2025]

-- Add files to filegroups
ALTER DATABASE ECommerceDB ADD FILE 
( NAME = N'OrderData2022', FILENAME = N'C:\Data\OrderData2022.ndf' )
TO FILEGROUP [FG2022]

-- Repeat for other years...

-- Create partition function and scheme (as shown earlier)

-- Create the partitioned Orders table
CREATE TABLE Orders (
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    OrderDate DATETIME NOT NULL,
    CustomerID INT NOT NULL,
    TotalAmount DECIMAL(10,2) NOT NULL,
    Status VARCHAR(20) NOT NULL
) ON [PS_OrderDate](OrderDate)

-- Create a non-clustered index on CustomerID
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID ON Orders(CustomerID)
ON [PS_OrderDate](OrderDate)

With this setup, queries filtering on OrderDate will benefit from partition elimination. For example:

SELECT * FROM Orders 
WHERE OrderDate >= '2023-06-01' AND OrderDate < '2023-07-01'

SELECT * FROM Orders 
WHERE OrderDate >= '2023-06-01' AND OrderDate < '2023-07-01'

This query will only scan the relevant partition(s) in the FG2023 filegroup, significantly reducing I/O and improving performance.

Implementing Sliding Window Partitioning

For our e-commerce scenario, we might want to implement a sliding window strategy to maintain the most recent data online while archiving older data. Here’s how we can achieve this:

-- Step 1: Prepare the next partition
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2026]
ALTER DATABASE ECommerceDB ADD FILE 
( NAME = N'OrderData2026', FILENAME = N'C:\Data\OrderData2026.ndf' )
TO FILEGROUP [FG2026]

-- Step 2: Modify the partition scheme to include the new filegroup
ALTER PARTITION SCHEME [PS_OrderDate] NEXT USED [FG2026]

-- Step 3: Split the rightmost partition
ALTER PARTITION FUNCTION [PF_OrderDate]() 
SPLIT RANGE ('2025-01-01')

-- Step 4: Move or archive data from the leftmost partition
-- (Assuming we want to keep 3 years of data online)
CREATE TABLE ArchivedOrders (
    -- Same structure as Orders table
) ON [FG2022]

INSERT INTO ArchivedOrders
SELECT * FROM Orders
WHERE $PARTITION.[PF_OrderDate](OrderDate) = 1

-- Step 5: Remove data from the original table
DELETE FROM Orders
WHERE $PARTITION.[PF_OrderDate](OrderDate) = 1

-- Step 6: Merge the empty partition
ALTER PARTITION FUNCTION [PF_OrderDate]() 
MERGE RANGE ('2022-01-01')

-- Step 7: Remove the old filegroup (optional)
ALTER DATABASE ECommerceDB REMOVE FILE OrderData2022
ALTER DATABASE ECommerceDB REMOVE FILEGROUP [FG2022]

-- Step 1: Prepare the next partition
ALTER DATABASE ECommerceDB ADD FILEGROUP [FG2026]
ALTER DATABASE ECommerceDB ADD FILE 
( NAME = N'OrderData2026', FILENAME = N'C:\Data\OrderData2026.ndf' )
TO FILEGROUP [FG2026]

-- Step 2: Modify the partition scheme to include the new filegroup
ALTER PARTITION SCHEME [PS_OrderDate] NEXT USED [FG2026]

-- Step 3: Split the rightmost partition
ALTER PARTITION FUNCTION [PF_OrderDate]() 
SPLIT RANGE ('2025-01-01')

-- Step 4: Move or archive data from the leftmost partition
-- (Assuming we want to keep 3 years of data online)
CREATE TABLE ArchivedOrders (
    -- Same structure as Orders table
) ON [FG2022]

INSERT INTO ArchivedOrders
SELECT * FROM Orders
WHERE $PARTITION.[PF_OrderDate](OrderDate) = 1

-- Step 5: Remove data from the original table
DELETE FROM Orders
WHERE $PARTITION.[PF_OrderDate](OrderDate) = 1

-- Step 6: Merge the empty partition
ALTER PARTITION FUNCTION [PF_OrderDate]() 
MERGE RANGE ('2022-01-01')

-- Step 7: Remove the old filegroup (optional)
ALTER DATABASE ECommerceDB REMOVE FILE OrderData2022
ALTER DATABASE ECommerceDB REMOVE FILEGROUP [FG2022]

This process can be automated using SQL Server Agent jobs to run periodically, ensuring your partitioning scheme stays current with your data retention policies.

Optimizing Queries for Partitioned Tables

While partitioning can significantly improve performance, it’s crucial to write queries that take advantage of this structure. Here are some tips:

Use the partitioning column in WHERE clauses: This allows SQL Server to use partition elimination.

-- Good: Uses partition elimination
SELECT * FROM Orders WHERE OrderDate >= '2023-01-01' AND OrderDate < '2024-01-01'

-- Bad: Full table scan
SELECT * FROM Orders WHERE TotalAmount > 1000

-- Good: Uses partition elimination
SELECT * FROM Orders WHERE OrderDate >= '2023-01-01' AND OrderDate < '2024-01-01'

-- Bad: Full table scan
SELECT * FROM Orders WHERE TotalAmount > 1000

Be cautious with JOINs: If joining partitioned tables, ensure the partitioning columns are used in join conditions to maintain partition-aligned operations.
Consider partition-aligned indexes: Creating indexes that include the partitioning column can improve query performance.

CREATE NONCLUSTERED INDEX IX_Orders_Status_OrderDate ON Orders(Status, OrderDate)
ON [PS_OrderDate](OrderDate)

CREATE NONCLUSTERED INDEX IX_Orders_Status_OrderDate ON Orders(Status, OrderDate)
ON [PS_OrderDate](OrderDate)

Use partition-aware maintenance: When rebuilding indexes or updating statistics, consider doing so on a per-partition basis.

-- Rebuild index for a specific partition
ALTER INDEX IX_Orders_Status_OrderDate ON Orders 
REBUILD PARTITION = 2

-- Rebuild index for a specific partition
ALTER INDEX IX_Orders_Status_OrderDate ON Orders 
REBUILD PARTITION = 2

SQL Server Partitioning with a Rapid Database Builder

When working with SQL, executing efficient queries is straightforward, but sometimes more complex operations, like managing large datasets using SQL Server partitioning clauses, require extra attention.

This is where rapid database builders like Five come into play. Five allows you to connect to your existing SQL Server database, enabling you to build a web application on top of your current data infrastructure.

Five allows you to create interactive forms, tables, dynamic charts, comprehensive PDF reports, and dashboards that are automatically generated based on your database schema. This makes it easier to visualize and analyze the results of queries, including those that use SQL Server’s partitioning features.

Additionally, Five lets you write custom JavaScript and TypeScript functions, providing the flexibility to implement custom business logic that can interact with your data.

Once your application is ready, Five makes deployment easy with just a few clicks, allowing you to deploy your application connected to your SQL Server database to a secure, scalable cloud infrastructure. This lets you focus on development while Five handles the complexities of cloud deployment and infrastructure management.

If you’re serious about using SQL efficiently, give Five a try.

Sign up for free access to Five’s online development environment and start building your web application connected to your SQL Server database today.

Build Your Database In 3 Steps
Start Developing Today

Monitoring Partition Usage and Performance

To ensure your partitioning strategy is effective, regular monitoring is essential. Here are some useful queries:

-- Check row count per partition
SELECT 
    p.partition_number,
    p.rows,
    au.total_pages,
    au.used_pages,
    OBJECT_NAME(p.object_id) AS TableName,
    fg.name AS FileGroupName
FROM sys.partitions p
JOIN sys.allocation_units au ON p.partition_id = au.container_id
JOIN sys.filegroups fg ON au.data_space_id = fg.data_space_id
WHERE OBJECT_NAME(p.object_id) = 'Orders'
ORDER BY p.partition_number

-- Check partition elimination in query plans
SELECT qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
WHERE qp.query_plan.exist('//RelOp[@PhysicalOp="Clustered Index Scan" or @PhysicalOp="Clustered Index Seek"]/*[local-name()="RunTimePartitionSummary"]')=1

-- Check row count per partition
SELECT 
    p.partition_number,
    p.rows,
    au.total_pages,
    au.used_pages,
    OBJECT_NAME(p.object_id) AS TableName,
    fg.name AS FileGroupName
FROM sys.partitions p
JOIN sys.allocation_units au ON p.partition_id = au.container_id
JOIN sys.filegroups fg ON au.data_space_id = fg.data_space_id
WHERE OBJECT_NAME(p.object_id) = 'Orders'
ORDER BY p.partition_number

-- Check partition elimination in query plans
SELECT qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
WHERE qp.query_plan.exist('//RelOp[@PhysicalOp="Clustered Index Scan" or @PhysicalOp="Clustered Index Seek"]/*[local-name()="RunTimePartitionSummary"]')=1

These queries help you understand how your data is distributed across partitions and whether your queries are effectively using partition elimination.

Conclusion

SQL Server partitioning is a technique for managing large-scale data and optimizing query performance. By dividing data into smaller, more manageable chunks, you can significantly improve database operations, from querying to maintenance.

Remember, partitioning isn’t a one-size-fits-all solution. It’s most effective for large tables (typically over 1 million rows) and scenarios where data has a natural partitioning key (like date ranges in our e-commerce example). Always test thoroughly in a non-production environment before implementing partitioning in your production systems.

Sign up for free access to Five’s online development environment and start building your web application connected to your SQL Server database today.