SELECT-ing what you need and what you don’t.

By May 13, 2018SQL Server

Before we get into the code, we need to establish some useful terms to help understand what we’re doing in this article. I want us to be using the same language. You’ll see or hear these words being used by people that work with databases and I don’t want you to get hung up on terminology or semantics.

  • Database – The place where the data is kept. This one is kind of sticky because people confuse to the database server with the database. A database server is just a computer running database software that you can connect to. A database is the thing on the database server where the data is kept.
  • Table or relation – Data in a database is organized into tables. These structures are grids organized into named columns and rows. This is similar to a spreadsheet if you have experience with excel.
  • Relational – A relation is the math term for a table that we described above, so relational databases are databases that store their data in a tabular way.
  • Column – A table has a pre-defined list of attributes that can be set when putting a new row of data into it. Most database will also give you additional controls such as only allowing certain kinds of data to be stored in a column, if the column is allowed to be empty, or how large the item stored can be. All of the values must be the same type of data.
  • Row – the name of entries in the table. Each row represents a distinct entry with its own unique set of attributes stored in the columns. The Table specifies what Columns are allowed, the Row fills in the details describes something known using the Columns that are provided.
  • Query – In its simplest form it’s a question. Later we’ll see how it can also be an imperative statement to inform the database to do something.
  • SQL – stands for Structured Query Language. It’s a programming language designed to talk to relational databases to tell them what we want and how we expect to receive it.

If you haven’t read the first post and downloaded the setup scripts to create the sample tables, go ahead and do that now because we’re going to
keep referring to them throughout the series. 
The scripts will create 3 tables for us to use for our examples. TblIndividuals is a table full of people, TblAddresses is a table full of addresses that belong to our fictional people, and TblAddressTypes is what’s called a lookup table. We’ll go over what the other two are for later, but for now, we’re going to get very familiar with TblIndividuals. To get the example scripts to work make sure you set the connection in your tool to be scoped to the database where you put the example tables, it defaults to tempDB.

Quick note: I’m a SQL Server DBA so all of my scripts are based on MS SQL Server. It shouldn’t be too hard to get them to work on different platforms. If you are trying to follow along on another RDBMS and can’t get it working, I’d be happy to help you get it working, just leave a comment below or use the contact page to send me an email. 

We’re going to be working through a fictional scenario to help illustrate some examples of common interactions with a database. All of the data is 100% fabricated and was generated using a tool made by Redgate Software called SQL Data Generator, great tool for this kind of work. You’re going to fill the fictional role of someone in IT who is responsible for assisting with ad-hoc reporting. This is a huge oversimplification and hopefully you’re not going to just be writing queries and exporting them to excel, but it happens, trust me. You have a boss named Jack, and Jack needs us to pull some data for him to use in important meetings and to make decisions.

Let’s SELECT some data!

SELECT is a keyword that is used in SQL to tell the database that you want it to return data to you. It’s almost like asking a question, but doing it in a way that the database can understand. It’s important in programming that we ask the right questions, and ask them the right way. Let’s look at the basic syntax that makes up a SELECT statement.

 

Example Syntax

SELECT  {column(s) | *}
FROM {table} {alias}

As we said before the SELECT statement is a keyword that tells the database to return data to you. For it to work correctly, you need say “SELECT something FROM somewhere“. That something is a list of columns, or * to indicate all columns possible, and somewhere is the table that is storing your data. You should avoid using a wildcard “*” in finished queries. They perform poorly and are difficult if not impossible to optimize for correctly. They’re generally okay for figuring out what you need, but your mileage may vary. When supplying a column or list of columns, they need to be a comma separated list. We’ll see this in our first example.

 

Example 1 – Basic Select

Above, we saw that you could use a SELECT and the symbol “*” to return all the possible columns on the individuals table. It’s your first week on the job and Jack is trying to help learn the ropes, he wants you to pull the full list of people in our database. What he doesn’t want is everything possible, so we’re going to trim it down to just a few columns. “Just give me the basic profile data.” He says. So to accomplish that we use the code below.

SELECT i.IndividualID
, i.Prefix
, i.FirstName
, i.MiddleName
, i.LastName
, i.MaidenName
, i.Suffix
, i.Gender
, i.SSN
, i.GraduationYear
, i.GPA
, i.Major
, i.Class
FROM dbo.TblIndividuals i

If you refer back to the first post of the series, we talked briefly about the “Order of Execution”. This is our first chance to see it in action. Reading the code as written, The SELECT keyword is the first instruction found, but it actually is the last thing processed.

The FROM is processed first, even though it comes after the SELECT. This table actually has many more columns, but the result should only have the columns that we asked for. You can think of it as if it takes data FROM the table and then applies a SELECT almost like a filter. This filter only allows certain columns, the ones that were requested, through back to the calling query. SELECT can do more than act as a filter, but we’ll see that in a later example. Below is the result of our query.

IndividualID Prefix FirstName MiddleName LastName MaidenName Suffix Gender SSN GraduationYear GPA Major Class
1 Mrs Warren Blanca Colon M 200089265 1993 4.84 Technical 2008
2 Dr Liza Micah Griffin Morgan F 558709643 2005 3.97 Technical 2002
3 Mr Lakesha Abel Bates M 373646793 2010 4.82 Technical 2013
4 Mr Stacey Maggie Andrews Burgess Ph.D. F 735142950 1997 2.00 Prepaid Customer 2013
5 Mrs Serena Christie Chen D.D. F 224608707 2003 4.69 Accounting 2012

 

Example 2 – Using TOP

Jack quickly realizes that he needed to be more specific. If  you were to really run the query I provided above, it would return a lot of data for what Jack is needing. He calls you up and says “I can’t get that email to clear the exchange team’s filter. The excel file you gave me is too large. Can you just give me 10 people? I really just need a sample set so it doesn’t matter which 10 people it is.” You’ll probably find yourself in this position often, and in the next post we’ll talk about how you can apply filters to your queries to reduce them based on some criteria. Since we’re limiting ourselves to what we can do with the SELECT statement, I’ll show you a different trick we can use.

SELECT TOP 10
        i.IndividualID
      , i.Prefix
      , i.FirstName
      , i.MiddleName
      , i.LastName
      , i.MaidenName
      , i.Suffix
      , i.Gender
      , i.SSN
      , i.GraduationYear
      , i.GPA
      , i.Major
      , i.Class
FROM    dbo.TblIndividuals i

We’ve added a modifier to the SELECT keyword called TOP. It allows you to only pull back a certain number of records. By itself you can’t really control what data you get back because it only takes a quantity. This works the way the previous command worked. FROM will create a set of data that gets shaped by SELECT to only allow certain columns to come back. The difference is that after it receives 10 rows from the FROM component, it stops asking for more rows.  There are other ways that TOP works, such as using percentages. Check the documentation provided if you have more sophisticated needs. One thing I really need to stress about this example is that data returned from the database are called sets. Sets are a another math term and one thing about them are by definition they are without order. Without specifying the way you want the data sorted, you could get any 10 rows back from the table in the query above, but generally the engine will return them the way it found them stored on disk. Don’t be fooled by thinking that they will always come back in the order you need but for Jack’s request, this will suffice. Below is the result of our query.

IndividualID Prefix FirstName MiddleName LastName MaidenName Suffix Gender SSN GraduationYear GPA Major Class
1 Mrs Warren Blanca Colon M 200089265 1993 4.84 Technical 2008
2 Dr Liza Micah Griffin Morgan F 558709643 2005 3.97 Technical 2002
3 Mr Lakesha Abel Bates M 373646793 2010 4.82 Technical 2013
4 Mr Stacey Maggie Andrews Burgess Ph.D. F 735142950 1997 2.00 Prepaid Customer 2013
5 Mrs Serena Christie Chen D.D. F 224608707 2003 4.69 Accounting 2012
6 Dr. Mandi Carla Greer M 453585211 2013 2.79 Prepaid Customer 2004
7 Ms. Paige Shelly Parker M 598517710 1997 2.62 Technical 2006
8 Mr Craig Calvin Mc Mahon M 069056634 1998 2.80 Marketing 2011
9 Ms. Milton Celeste Moody F 017940008 1992 2.65 Accounting 2002
10 Dr. Freddie Wade Velazquez F 272093597 2006 3.72 Accounting 2009

 

 

Example 3 – Derived Columns

SELECT TOP 15
    i.Prefix
  , i.FirstName
  , i.MiddleName
  , i.LastName
  , i.MaidenName
  , i.Suffix
  , IIF(i.Gender = 'M', 'Male','Female') AS Gender
  , CONCAT(REPLICATE('0',5), RIGHT(i.SSN,4)) AS SocialSecurityNumber
  , i.GraduationYear
  , i.GPA
  , i.Major
  , i.Class
  , CASE WHEN GPA<= 4.00
  THEN 'A'
    WHEN GPA <= 3.00
      THEN 'B'
    WHEN GPA <= 2.00
      THEN 'C'
    WHEN GPA <= 1.00
      THEN 'D'
    ELSE 'F' END AS LetterGrade
FROM dbo.TblIndividuals i

 

Prefix FirstName MiddleName LastName MaidenName Suffix Gender SocialSecurityNumber GraduationYear GPA Major Class LetterGrade
Mrs Warren Blanca Colon Male 000009265 1993 4.84 Technical 2008 A
Dr Liza Micah Griffin Morgan Female 000009643 2005 3.97 Technical 2002 B
Mr Lakesha Abel Bates Male 000006793 2010 4.82 Technical 2013 A
Mr Stacey Maggie Andrews Burgess Ph.D. Female 000002950 1997 2.00 Prepaid Customer 2013 C
Mrs Serena Christie Chen D.D. Female 000008707 2003 4.69 Accounting 2012 A
Dr. Mandi Carla Greer Male 000005211 2013 2.79 Prepaid Customer 2004 C
Ms. Paige Shelly Parker Male 000007710 1997 2.62 Technical 2006 C
Mr Craig Calvin Mc Mahon Male 000006634 1998 2.80 Marketing 2011 C
Ms. Milton Celeste Moody Female 000000008 1992 2.65 Accounting 2002 C
Dr. Freddie Wade Velazquez Female 000003597 2006 3.72 Accounting 2009 B
Mr William Janelle Beard Cuevas Female 000002521 1998 2.54 Corporate Care 2004 C
Dr. Manuel Lesley Stevenson D.D. Female 000009724 1997 4.64 Corporate Care 2006 A
Mr Perry Desiree Lang D.Phil. Male 000009591 1992 4.47 Technical 2007 A
Mr Guadalupe Tammi Perkins Male 000002265 1998 3.44 Technical 2004 B
Mr Charlie Ryan Juarez Female 000008818 1992 2.10 Accounting 2002 C

Leave a Reply