Skip to content

When to allow NULLs

July 5, 2020

Some years ago I did a post on NULLs where I detailed some of the issues you can get with them. More recently I did another post on NULLs that detailed an example of an interesting issue some NULLs had caused in work when looking at an IN statement.

Now I want to talk a bit about when we should be using NULLs and when we shouldn’t, inspired again by some things I’ve seen in legacy code a few days ago. And when I say using, I mean when should we allow values to be NULL, in tables, and parameters, and anything else.

The first thing to remember is what does a NULL represent. It’s very simple, NULL means “I don’t know”. If you keep that in mind, and you’re practising some good development design where everything has a point, you should be able to tell if it’s possible to be unsure about any particular value for a row.

A quick example:

CREATE TABLE dbo.Customer
    (
          CustomerID INT IDENTITY
        , Forename VARCHAR(100)
        , Surname VARCHAR(100)
        , MiddleNames VARCHAR(300)
        , IsEmailable BIT
        , MobileNumber CHAR(11)
        , HomePhoneNumber CHAR(11)
        , DateOfBirth DATE
    )

Now, I’ve not included any NULL/NOT NULL in this example, because I want to talk through a few of these columns before I give my opinions.

  • CustomerID – This is the primary key of the table. It uniquely identifies every row. We definitely want this to not be NULL, and we are using an IDENTITY value (another option here is to use a SEQUENCE object) to make sure we always have a value.
  • MiddleNames – Not everyone has a middle name. We could set this up to not allow NULLs and force a default like blank string (maybe by setting a default in the column definition), or we could allow NULLs. Either approach seems legitimate here.
  • MobileNumber/HomePhoneNumber – Again, not everyone has a mobile, or a home phone number. NULL seems like a legitimate choice here, especially as we don’t have the empty string option because any value we use has to be 11 characters. We could default to 11 spaces, but then whatever reads this table has to code around that in pretty much exactly the same way it would with NULLs.
  • Forename/Surname – Most people have both a forename and surname (sidenote, I hate when I see FirstName and Surname, it’s either Forename/Surname or FirstName/LastName). But not everyone does, there are some cultures where it’s less normal, and even within western cultures you have individuals like Cher or Sting. What if they end up in your database for some reason? Even if you are assuming everyone will have a forename and surname, your data may not be the best quality. In these cases you need to make a judgement call, you can either reject data without forename and surname, or allow NULLs or some default value in your database.
  • DateOfBirth – Everyone definitely has one of these, but your source data may not always capture it. In this case in particular, a default value can do odd things, and you are safer with allowing NULLs or rejecting any data without a date of birth.
  • IsEmailable – This is the column from real life that triggered this post. In our table it was set to allow NULLs, but I think that is a fundamental mistake. This column is used to decide if we can send an email to someone or not, and regardless of the quality of our source data, we have to know if we will include this person in our email campaign or not. Now, we may get customers who come through without any kind of email permissions in their data set, but when we run an email campaign we either include that customer or not (usually not) so that should be written into the insert into this table, and do not allow a NULL in that column.

What it should look like:

CREATE TABLE dbo.Customer
    (
          CustomerID INT IDENTITY NOT NULL -- primary key
        , ForeName VARCHAR(100) NOT NULL -- everyone has at least 1 name
        , Surname VARCHAR(100) NULL --  not everyone has multiple names
        , MiddleNames VARCHAR(300) NULL -- not everyone has a middle name
        , IsEmailable BIT NOT NULL -- we always know if we will send an email or not
        , MobileNumber CHAR(11) NULL -- not everyone has a mobile number
        , HomePhoneNumber CHAR(11) NULL -- not everyone has a home phone number
        , DateOfBirth DATE NULL -- we don't always get this in the data and it isn't vital to this solution
    )

NULL/Default

In a couple of the columns I said you could use a default or a NULL. I prefer NULLs when you genuinely aren’t sure, because defaults can do odd things to your queries. For example, say you have a default in the DateOfBirth column of ’01-Jan-1900′ because you’re confident that’s not going to be a real value. But, for whatever reason you need to find anyone who’s over 18, so you include a where clause of DATEDIFF(yy, DateOfBirth, SYSDATETIME()) >= 18 and you bring back everyone who’s date of birth you don’t know, and potentially you send some unsuitable mail to a minor and get sued. Leaving DateOfBirth as NULL keeps those rows out of any queries that are filtering on DateOfBirth, and means you have to make a conscious decision to include them.

Conclusion

NULLs are part of the T-SQL language for a reason, and they help capture cases where you might genuinely not know the value of something. Just setting every column to allow NULLs causes you some issues down the line, as every bit of code then has to handle that possibility, but not allowing NULLs when they could legitimately occur can create other issues. You should consider whether to allow NULLs on a column by column basis, both because future developers will have to account for your decisions in their code, and because the decision you make sends a message to those future developers about the shape of the data you are expecting to see inputted into the system.

From → Uncategorized

One Comment

Trackbacks & Pingbacks

  1. When NULL Makes Sense – Curated SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: