Skip to content

T-SQL Tuesday #123: Life hacks to make your day easier – Custom shortcuts in SSMS/ADS

T-SQL Tuesday logo

This is my first time taking part in T-SQL Tuesday. It’s something I’ve known about for a while but not taken part in before. But one of my goals this year is to post at least once a week on this blog for 3 months, so I figure I should take every prompt I can. Plus this week’s topic is something I feel I can contribute to.

This week’s topic is about any hacks you have to make your day to day life easier, and I wanted to share a little trick someone showed me about 10 years ago, that I’ve found incredibly useful ever since.

The basics of it is, you can add your own query shortcuts to SSMS, and when you run them they append any highlighted text to the shortcut code. To explain, I’ll use the example I’ve found most useful:

One of the first things I do whenever I start a new job or install a new version of SSMS is I set up ctrl+5 as the shortcut for “SELECT TOP 100 * FROM “

Once I’ve done that I can highlight any table or view in my code, and use ctrl+5 to see 100 rows from that object, because it appends the highlighted text to the end of my shortcut text. But you can do more with it. Take this bit of code:

SELECT
      A.Col1
    , B.Col2
    , C.Col3
FROM TableA AS A
INNER JOIN TableB AS B
    ON A.ID = B.ID
LEFT JOIN
    (
        SELECT
              X.Col5
            , Y.Col3
            , X.ID
        FROM TableX AS X
        LEFT JOIN TableY AS Y
            ON X.ID = Y.ID
            AND X.Col3 = 'Some value'
    ) AS C
    ON A.ID = C.ID

I can highlight TableA, TableB, TableX, or TableY, and in each case Cltrl+5 will show me all of the columns in the table and a sample of the data. Or I can highlight TableA AS A INNER JOIN TableB AS B ON A.ID = B.ID and get a sample of all the columns available from that. Or I can do something similar inside the subquery and see all the columns I have available to me in that subquery.

The main other shortcut I like to set is Ctrl+6 as “SELECT SUM(1) FROM “, to tell me how many rows are in a table.

If you want to set these up it’s very easy:

In SSMS go to Tools>Options>Keyboard>Query Shortcuts.

In Azure Data Studio it’s a little more hidden. You might think you’d be able to do this in File>Preferences>Keyboard Shortcuts, but that only gives you a list of pre-canned commands that you can assign to different shortcuts as you see fit. To write your own you need to go to Cog icon(bottom left)>Settings>Data>Query to create your own shortcuts. You need to be a bit careful because there will already be other shortcuts assigned to your key combinations, so you will need to go into File>Preferences>Keyboard Shortcuts to move them to other shortcuts, otherwise Data Studio will try and do both things when you use your shortcut.

Anyway, that’s it, hopefully someone will find this useful.

Useful link for .NET training

I’m interested in learning some .NET coding. I’ve always been a pure SQL developer but more and more I feel my lack of .NET knowledge is holding me back. I’ve found a surprising lack of information online about how to get started with this, but I found one set of tutorials from Microsoft that talk through the basics of making various different applications.

I’m not sure how much use they would be to a complete beginner, but to someone who’s reasonably proficient with C# coding, and is aware of what a css file contributes to a website, but really isn’t sure how the different things fit together, I found them really useful.

https://docs.microsoft.com/en-us/visualstudio/get-started/csharp/tutorial-console?view=vs-2019

Interesting NULL issue

I had a question today from someone in my team wondering why their code wasn’t returning the correct results. They’d written:

SELECT
      *
FROM TableA;

SELECT
      *
FROM TableA
WHERE 1 = 1
    AND ID1 IN (SELECT ID1 FROM TableB);

SELECT
      *
FROM TableA
WHERE 1 = 1
    AND ID1 NOT IN (SELECT ID1 FROM TableB);

The first query returned about 600 rows, the second returned 300ish, the third returned nothing.

I spotted some NULLs in TableA.ID1, and added an OR ID1 IS NULL to the WHERE clause of the third query. This produced some results but not enough.

We looked at it for a little while longer, and eventually found some NULLs in TableB.ID1. These were causing the problem because a NULL represents unknown values. Therefore, when then query tries to evaluate if any particular value for TableA.ID1 is not in the list of TableB.ID1s from the subquery, it can’t be sure because one of the items in the list has an unknown value.

In summary, if you are doing a NOT IN, make sure you don’t have a NULL in your list anywhere, or the predicate will always return false and (if the predicate is in the WHERE clause) your query won’t return any values.

What’s the point of that object

I can be a bit of a purist when it comes to design principals, but I think this is an important one.

Every object, schema, database etc. that you create should have a purpose, and that purpose should contain a What and a Why.

So, in my data warehouse designs I have a Staging database. It’s not enough to say “the staging database exists to hold staged data”, we have to say “the staging database exists to stage (extract, validate, apply transformation logic to) data from source systems so that we can present that staged data for the data warehouse to consume”. By keeping this purpose in mind whenever we make any changes to the database we make sure it is not filled with objects that are irrelevant to the purpose of the database, and by keeping the purpose simple we avoid having the database attempting to do too much.

We can apply this to schemas, and to tables, and stored procs and functions and pretty much everything else.

This provides additional benefits around documenting, unit testing, and other best practices you should be following with databases.

LIKE and REPLICATE

The LIKE operator can be used for a lot of good things. Recently, however, I’ve discovered the REPLICATE function which adds another dimension to what you can do. Simply put, the REPLICATE function takes a string and an integer input, and outputs the string repeated a number of times determined by the integer. The key thing here is that the string can be a wildcard from the pattern searches you can do with a LIKE or PATINDEX in SQL Server.

For instance REPLICATE(‘[0-9]’, 3) will output ‘[0-9][0-9][0-9]’ and this can be used as one side of a LIKE operator. Or it can be concatenated with other text.

As an example, when checking if the column Col1 is a decimal, the following code will check this for us:

SELECT

      *

FROM Table1

WHERE 1 = 1

AND Col1 LIKE ‘[-0-9]’ + REPLICATE(‘[0-9]’, LEN(Col1)-4) + ‘.[0-9][0-9]’

The ‘[-0-9]’ at the start checks that the first character is either numeric or a – sign. The REPLICATE part checks that the next X characters are numeric (where X is the length of the string – 4). The final ‘.[0-9][0-9]’ checks that the string ends with a decimal point followed by 2 integers.

NULLs

I just saw a question on SQLServerCentral about NULLs behavior and thought I could write something about the common gotchas when using NULL in T-SQL, and why they behave the way they do.

To go into this, we need to understand the logic behind what a NULL represents. A NULL is used, or at least should be used, to denote the absence of information. It is not a 0 or a blank string or any other value. For instance, if you are looking at pizza orders and someone’s order comes to £0.00 for whatever reason (saved a lot of coupons, sleeping with the manager, etc.) then the OrderValue column should be set to 0.00. If, however, the order value is not known (maybe the system failed that day and orders are being input by hand) the OrderValue column should be set to NULL.
When you come to query this table, you would treat the 0 and NULL differently, if you want a total of sales values for the month you would include the 0 in your total, but the NULL makes that total unknowable. SQL Server recognizes this, which is why a SUM of the total sales values will return a NULL if at least one of the sales values is NULL.

That’s important behavior, but it becomes even more important when looking at WHERE conditions (and HAVING conditions, and CASE statements, and anything else that evaluates an expression). Usually an expression can be evaluates to TRUE or FALSE, so 1 = 1 evaluates to TRUE and 1 = 2 evaluates to FALSE. If one side of the expression is NULL then the expression evaluates to UNKNOWN, because in the case of 1 = NULL, you are asking does an unknown value equal 1 and obviously you don’t know the answer. This holds even if both sides of the expression are NULL, because in that case you are asking if two unknown value equal each other.

This gets really important when you consider that WHERE clauses are only looking for rows where the expressions evaluate to TRUE.

Going back to the pizza orders, if you are filtering for orders where OrderValue = 9.99, any orders with OrderValue of NULL will be excluded (expression evaluates to UNKNOWN). Where people often fall down is that if you are filtering for orders where OrderValue <> 9.99, and orders with OrderValue of NULL will still be excluded. To include these in the filter you would need to filter for orders where OrderValue <> 9.99 OR OrderValue IS NULL.

 

This may seem like a long-winded explanation for something that’s pretty simple, but what I’ve tried to do in this post is lay out the fundamentals of why NULLs work as they do. Hopefully you can then work out for yourself how NULLs will work in a particular situation.

Stuff I’ve learned from my degree part 1

I want to write about some of the things I’ve learned as part of my Open University degree so far.
The degree is in Computing & IT and Statistics, and I started it for a couple of reasons.

Firstly, I had reached a point in my career where all of my peers seemed to have a much better understanding of computing in general than I did and I needed a way to catch up, and secondly I noticed that very few people in the BI industry have any statistical knowledge and I thought that would be a good way to stand out. In both cases, I think it’s been beneficial but I wanted to write about some of the computing fundamentals and how they helped me.

So, before the course, I was aware that a bit was a unit of memory, and that byte, megabyte, gigabyte etc. were also units of memory, and that computers thought in 1s and 0s but I never understood what that meant or the implications for data storage and data types until I did this course.
What I learned was that a bit was a single 1 or 0, and a byte was a collection of 8 of these bits. Why 8? Because that gives you 256 distinct combinations, which is enough to represent the most commonly used letters, numbers and symbols in English (so 00000000 = A, 00000001 = B, 00000010 = C and so on).
Suddenly at this point a lightbulb went off in my head. A single character in a Char or Varchar field is represented by one of these bytes. From there, the differences between Char and Varchar became clear in terms of data storage.
There would be a specific combination of 1s and 0s that represented no character and that is what is used to pad a Char field to the required length, which is why a Char field always takes up the same amount of space on each row. The computer would read this long string of 1s and 0s that represented the whole field, slicing it at every 8th bit and translating these slices into characters, and when it reached the allocated number of bytes for the column it would know that the bits after this represented the next column.
For a Varchar field, there is no fixed length. Instead, there is a specific combination of 1s and 8s that represent the end of the column, and the computer will know that only bits before this represent this column.
From this, it follows that a Char field that is too long compared to the data it holds will take up excess space in the database because of the extra bytes to represent no character. It also follows that any Varchar field will take up 1 byte’s worth of space more than its length because of the ‘end of column’ byte.
Finally, the NChar and NVarchar fields also made sense. These are used to represent characters from many languages, and so they need far more possible combinations of 1s and 0s than can be handled in a single byte. So we represent each character with 2 bytes instead. This gives 256×256 = 65536 different combinations and so we can map these to 65536 characters. Apart from that they work exactly the same as the Char and Varchar fields, they just take up double the space.

None of this was exactly new knowledge, I knew that NChar took up double the space of Char, and that you shouldn’t use Char when the values are likely to be variable in length because it wasted storage, but understanding why made a huge difference to how I thought about assigning data types.