Data validation is an essential part of web application development. Clean data, usually collected through forms, is an essential backbone of enterprise IT. And whether you are a developer building a form, or an end-user filling in a form: I’m sure everyone has had their fair share of form frustrations.
Practice is better than theory: let’s get hands-on with SQL’s data validation techniques. Learn how to build a web app on a SQL database and utilize many of the SQL data validation techniques by signing up for free access to Five’s database app builder.
Poorly designed forms and inadequate data validation at the source are probably the reason that in any given data science project, one-quarter of the time is spent on data cleansing. Incorrect date formats, invalid email addresses, or unclear questions: poorly captured data are a major drain on time for back-end developers, DBAs, or data and business analysts.
And for those building forms, it is rarely a joy: even though there are form libraries in React, or Form Helpers in Ruby on Rails, it is still hard and tedious work to develop good forms.
Don’t Feel Like Reading? Watch the Video Instead.
In this blog post, we will delve into the world of data validation in SQL, exploring various techniques, best practices, and examples. Whether you’re a database administrator, a data engineer, or a SQL enthusiast, this guide will equip you with the knowledge and tools needed to validate data effectively within your databases.
By the end of this blog post, you’ll have a solid understanding of data validation in SQL, along with practical insights and techniques to ensure the reliability and accuracy of your data. So, let’s dive in and unlock the power of SQL for data validation!
In today’s data-driven world, the importance of reliable and accurate data cannot be overstated. As organizations rely heavily on data for decision-making and analysis, it becomes crucial to establish robust mechanisms to validate the integrity and quality of data. This is where data validation in SQL comes into play.
Data validation refers to the process of examining and verifying data to ensure it meets specific criteria, such as data type, range, format, or any other predefined rules. While SQL (Structured Query Language) is commonly associated with querying and manipulating data, it also offers powerful features and techniques to validate data, making it a valuable tool for maintaining data integrity and accuracy.
Let’s start by looking at the most fundamental layer of data capture first: data storage and infrastructure.
The most fundamental question of any application development project is this: Where is my data stored?
Entry-level form builders, such as Google Forms store data in a web-hosted spreadsheet. This is the ideal solution for small research projects, surveys, or questionnaires in your local football club: Should Johnny or Dave be the new captain? Do we prefer pink or blue jerseys?
But spreadsheets are not a solution that can support businesses, let alone large-scale data capture.
The best place to store data is open-source, relational databases. The two most popular databases are MySQL and PostgreSQL, according to StackOverflow’s Annual Developer Survey.
Unlike spreadsheets, relational databases store data in a consistent, highly-efficient, safe, and secure manner. They are also scalable and portable, meaning they can support millions of read/write operations, and data can be moved, copied, or transferred easily from one database, storage, or IT environment to another.
Just like a spreadsheet, a database needs to be hosted (or stored) somewhere. Nowadays, this is usually on infrastructure provided by one of the three big cloud providers: AWS, Azure, or GCP. All three cloud providers offer different flavors of essentially the same service, i.e. a web-hosted and managed relational database (AWS RDS, Azure Database Server, or GCP Cloud SQL).
This setup – an open-source, relational database hosted by one of the big three cloud providers – gives your application development project a tried-and-tested, scalable application backend. In case you’re wondering, just how “tried-and-tested and scalable” this setup is, here’s a list of companies using MySQL on StackShare. The list includes Uber, Netflix, and Twitter.
And if now you are thinking: “We’re just a small business and nowhere near the scale of Netflix”, don’t underestimate how quickly data is growing over time, especially if you are in a transactional business. Data can also grow to due to unforeseen circumstances, such as the Covid-pandemic, when Excel reached breaking point at health care providers.
There are multiple ways to validate data in SQL. We will explore four data validation techniques in this blog post:
UNIQUE
or NOT NULL
.SQL constraints are a simple and effective way to put constraints on the data that your database permits end-users to store. They are the most fundamental layer of data validation in SQL. Through constraints, developers can ensure that fields are properly populated or prevent duplicate or redundant data.
PRIMARY KEY
, FOREIGN KEY
, UNIQUE
and NOT NULL
are some of the most frequently used built-in SQL constraints.
These constraints are applied to a database field inside a CREATE
or ALTER
statement. For example:
CREATE TABLE Customer (
ID int NOT NULL,
Name varchar(255) NOT NULL,
Age int,
UNIQUE (ID)
);
Note how the ID field is both NOT NULL
and UNIQUE
.
The same outcome can easily be accomplished in Five, using Five’s table wizard for MySQL.
To create a field with a NOT NULL
constraint, follow these four steps:
NULL
or NOT NULL
)Five automatically creates primary keys when a table is created. As Primary Keys are always unique and can never be empty, Five also automatically applies the UNIQUE
and NOT NULL
constraints to the PRIMARY KEY
field. For those database aficionados, Five’s Primary Keys are Globally Unique Identifiers (GUID), or a 128-bit globally unique ID.
To understand how to check the UNIQUE constraint in Five, follow these steps:
UNIQUE
constraint applied to your PRIMARY KEY
.Note that you can select multiple fields with a UNIQUE
constraint. If you choose multiple fields, the combined values in these fields must be unique: individually they may not be unique, but together they must.
As we created our NOT NULL field above, you might have noticed that Five’s Form Wizard presented us with two drop-down menus: one for data type, and one for default display type.
Data and display types define how data is stored inside the database and displayed to end users. Five supports all commonly used SQL data types, such as strings, integers, floats, etc.
Data types aren’t visible to the end user. They simply tell the database how to treat data: is something a string, an integer, or a password?
Display types, on the other hand, are visible to the end user, and can range from something as simple as a form field that accepts text to a date picker or a five-star rating input. The display type defines what the end-user can submit to the database using the application’s user interface.
In addition, developers can create custom display types in Five.
For example, if your end-user is asked to submit satisfaction with a service on a scale from 0 to 100, this can be created by enforcing minimum and maximum values of 0 and 100 respectively. If an end-user inserts any different input, a pop-up with an error message will appear. The same can be achieved by defining the accepted length of the input. Say you’re asking for a postcode and only accept four-digit postcodes. Then a display type with a required length of four characters can be created and applied to a field.
To see custom display types in action, watch this video on validating a field’s input with a custom display type.
A third way to validate data is regular expressions (RegExs). Regular Expressions are a powerful tool to describe data patterns. If an end user provides an input that does not correspond to the pattern, the input will not be accepted.
By defining a permitted input, Regular Expressions are more granular than data or display type. For example, a database field can be defined as a string, accepting any text input. But what if you would only like to accept a text input that is an email address? That’s where Regular Expressions come in.
To ensure that users can only enter email addresses, developers can define a custom display type inside Five. To do so,
The regular expression that checks for a valid email address looks like this:
^(([^<>()[\]\\.,;:\s@”]+(\.[^<>()[\]\\.,;:\s@”]+)*)|(“.+”))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$
This probably looks a bit intimidating. However, Regular Expressions can be easily found or created online with helpful tools, such as this RegEx builder.
JavaScript or TypeScript functions and OnValidate events can be used in Five to validate data.
Using a function goes beyond the more general validations described above. For example, a function could check if there is enough stock to fulfill an order: if yes, the order is processed. If no, an error occurs. Functions incorporate logic, rather than just patterns as Regular Expressions do. Through the use of functions, developers can write highly business-specific validations.
To write a function in Five, follow these steps:
To better understand how to tie your function to an OnValidate event, watch this video on how to validate a field’s input in Five:
One last advantage of relational databases is that they can come with very granular user access and permissions.
An acronym that is often used in this context is CRUD: Create, Read, Update, and Delete. These are the four basic database operations.
Inside Five, CRUD permissions can be granted on the table level, meaning a particular user role might only be able to create new records into a table, but cannot read, update or delete any data stored inside the table.
For example, say real estate agents are only allowed to create new listings in a property-for-sale database application. This can easily be accomplished using Five’s Roles.
Permissions can be even more granular than just on the table level. By assigning a User Table Key Field, users can also be given access to particular records inside a table.
For example, you are developing a real estate business with four agents. Each agent is in charge of one area of town: North, South, East, or West. Even though all data is stored in one table, each user can only be given access to their territory’s data by using the User Table Key Field.
Lastly, permissions can be on the UI / UX level. This can be accomplished by giving a particular role access to a certain Menu only. Sticking with the real estate example, you might want interested property buyers to have access to your application and browse your listings. This could be accomplished by creating a separate menu for buyers that only contains listings in it.
Learn how to build an entire web app on a SQL database by following our step-by-step app development guide.
The guide starts with the creation of a database from scratch and finishes with a web app that contains a chart, PDF reports, and forms for users to interact with your data. You will learn how to combine all these front-end elements with your back-end database.
Best of all: it’s free to follow!