In today’s discussion, I’m focusing on one important component of T-SQL, that is sub queries. As the name implied, a sub query is simply a query that is made up of outside query and inside query. This means, one query is embedded inside another query. The are two types of sub queries, regular sub queries and correlated sub queries.

In a regular sub query, which is the most common type of sub queries, the inside query is independent of the outside query. The inside query can be run on its own and generate a result set. It doesn’t depend on the outer query or the query in which it resides. However, the outside query depends on the inside query in order to produce meaningful result set. In the regular sub query, the inside query produces a result set which is fed to the outside query for processing it’s statement. So we can say, the result set of the inside query is an input to the processing of the outer query. Here goes a question, In a regular sub query, which query completes execution first? The answer is inside query, because the outer query starts to execute first until it reaches the inside query and it stops for the inside query to be executed and completed to generate result sets for the consumption and completion of the outside query. But l’ll think the query optimizer during compilation will already know the outside query in a regular sub query needs the result set of the inside query so it won’t even start execution first with the outside query, it should rather go ahead and execute the inside query first before touching the outside query.

In correlated sub queries, the inside query is not independent from the outside query. To make a point clearer, lets understand a sub query is just a select statement that may return a single value or rows of data. In the correlated sub query, the inside query is executed for every execution of  outside query. What this means is that, the inside query will execute for each row of data that the outside query needs to out put. In a regular query, the inside query is executed only once and its result set can be used by the outside query to process any amount of rows to be return in its result set.

 

Sub queries and Joins are very similar, whatever you can do with sub queries can be done with  a join but you may/may not be able do whatever you can do with joins using a sub query. One important advantage of joins of subqueries will be the columns to return in the final result set. While you can return any column from all the tables that are joined together using joins, sub queries can only return columns from the outside query.

Let’s get technical and see how we can use sub queries to accomplish some common tasks,

Let’s create three tables, Employee, Customer and Contact tables.

CREATE TABLE Employee(EmpID int Primary Key, ContactID int Foreign Key References Contact(ContactID), Hiredate Date, Dob Date)

CREATE TABLE Contact(ContactID int Primary Key, Firstname VARCHAR(50), LastName VARCHAR(50))

CREATE TABLE Customer(CustID int Primary Key, ContactID int Foreign Key References Contact(ContactID), Phone, ZipCode VARCHAR(50))

Now how can we find the name of all Employees. Note, the Contact table contatins names of both customers and employees. so you have to be able to find only names that are employee names. Lets do this using a sub query.

SELECT Firstname, Lastname

FROM Contact

Where ContactID IN (SELECT Contact ID

FROM Employee)

Lets walk through. First we need the names of the employees so we put that in the outside query. We are selecting that from contact table but we are restricting the names  to output by finding which contactid we are outputing names for. There are contactids for customers and employees. When the inside query runs, it returns contactids of all employees, then the outside query will use this result set and output a row of information containing first and last names for each contactid in the result set of the inside query.

Now, let’s see the weakness of a subquery. What about if you want to return the employee id as well as first and last names. Not possible. This is where join becomes a super hero. Let’s use join to accomplish that task.

SELECT e.EmpID, c.Firstname, c.Lastname

FROM Employee e

INNER JOIN Contact c

ON c.ContactID = e.ContactID

I discussed joins in an earlier tutorial so if you don’t get this code, please refer to that tutorial here.

Let’s take a look at the technical implementation for a correlated sub query.

Lets say l want to find the name of all customers who have a ZipCode of 92101 using correlate subquery.

SELECT c.*

FROM Contact c

WHERE 92101 IN (SELECT cu.ZipCode

FROM Customers cu

WHERE cu.ContactID = c.ContactID)

We see how the inside query get locked to the outside query at the Where clause in the inside query. The WHERE clause says WHERE cu.ContactID = c.ContactID

That means during execution, the outside query will get a contactid from the contact table, give it to the inner query and the inner query will get a zip code from the customer table where with the same contactid. If the contactid doesn’t exist in the customer table, a NULL will be returned, if the contactid exists, the zipcode will then be returned and the WHERE CLAUSE in the outer query will now try to find if the zipcode is 92101. If they match, the contact information for this customer will be out putted. Then the outer query will go to the next contactid in the contact table and the whole process will be repeated for that contactid. That means, the inside query will be executed once for every row in the outside query.

We are not tied to use correlated queries, regular sub queries or joins can be used to accomplish the same task. Lets do it using regular subqueries.

SELECT Firstname, Lastname

FROM Contact

WHERE ContactID IN ( SELECT ContactID

FROM Customers

WHERE zipcode = ‘92101’)

Lets do this same thing using joins.

SELECT c.Firstname, c. Lastname

FROM Contact c

INNER JOIN Customers cu

ON c.ContactID = cu.ContactID

WHERE cu.Zipcode = ‘92101’

Hope this is intuitive enough, got questions or suggestions? Drop it in the comment box. I’ll be discussing Temporary tables, derived tables and table variables in the next tutorial. Keep in touch. Peace.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *