In this quiz, we are asked to give some differences between a common table expression and a subquery and what are some advantages of using one over the other in OLAP vs OLTP systems.

This question is loaded so let’s break it down and explain each component. A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT,INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. I have written detail tutorial on CTE that can be found here  .

A subquery is a query that is written within another query such that there is an outer query and an inner query. There can be multiple subqueries and nested subqueries and there can be multiple CTEs defined in same query scope as well.

The other part of the question mentioned OLAP vs OLTP systems. What is an OLAP and What is an OLTP system?

OLAP refers to Online Analytic Processing system. Think of a data warehouse that sources data from different sources like websites, metrics systems, logs etc. OLAP systems are built for faster read purposes. They are highly denormalized and have dimensions and fact tables in a typical data warehousing system.

An OLTP on the other hand is an Online Transaction Processing system. These are systems responsible for point of sales and other types of transactions businesses do daily. They are built for faster inserts, updates and deletions. Most OLTP has normalized data with almost no redundancy in the tables.

Regarding the differences for CTE and subquery for  OLAP and OLTP, we can disregard the type of system and conclude the difference doesn’t matter on the type of system and rather generalize the differences between CTE and a subquery.

To make it more practical, let us model a simple school management system. Let’s assume we have a system for students, courses and students and their grades. Lets quickly model that system to use.

CREATE TABLE dbo.Student(StudentID INT IDENTITY(1,1) PRIMARY KEY, FirstName VARCHAR(255),LastName VARCHAR(255))

CREATE TABLE dbo.Student_Grade(StudentID INT, CourseID INT, Mark REAL)

CREATE TABLE dbo.Course(CourseID INT IDENTITY(1,1) PRIMARY KEY, CourseName VARCHAR(255))

INSERT INTO dbo.Student(FirstName,LastName)
VALUES(‘Vincent’, ‘Amedekah’),
(‘Kofi’, ‘Amedekah’),
(‘Cosby’, ‘Amedekah’)

INSERT INTO dbo.Course(CourseName)
VALUES(‘Data Enginering’),
(‘Machine Learning’),
(‘Deep Learning’)

INSERT INTO dbo.Student_Grade(StudentID,CourseID,Mark)
VALUES(1,1,100),
(2,1,95),
(3,1,95),
(1,2,100),
(2,2,90),
(3,2,95),
(1,3,100),
(2,3,90),
(3,3,95)

SELECT *
FROM dbo.Student

SELECT *
FROM dbo.Course

SELECT *
FROM dbo.Student_Grade

 

we will refer to the above system to answer the differences as much as possible.

So here are some differences and the explanation.

  1. CTE is more readable than a subquery : This is one of the biggest selling point for a CTE over a subquery. let’s practice with a CTE that get all sudents, courses and the mark they receive.

;WITH cte_students(StudentID,[FirstName],LastName,CourseName,Mark)
AS
(SELECT s.StudentID,
s.[FirstName],
s.LastName,
c.CourseName,
g.Mark
FROM [dbo].[Student] s
JOIN [Student_Grade] g ON g.StudentID = s.StudentID
INNER JOIN dbo.Course c ON c.CourseID = g.CourseID)
SELECT *
FROM cte_students

 

2. CTE is reusable but subquery is not : When you define a CTE, you can refer to it multiple times within the scope of the CTE creation but you can’t reuse a subquery, you have to recreate it all over again when needed in query scope. In the below query you will see the cte get used twice to find data engineers with mark greater than 95. Note this is just showing example of writing query using CTE, this query can be written to accomplish the same in different ways.

;WITH cte_students(StudentID,[FirstName],LastName,CourseName,Mark)
AS
(SELECT s.StudentID,
s.[FirstName],
s.LastName,
c.CourseName,
g.Mark
FROM [dbo].[Student] s
JOIN [Student_Grade] g ON g.StudentID = s.StudentID
INNER JOIN dbo.Course c ON c.CourseID = g.CourseID)
SELECT all_students.StudentID,all_students.FirstName,all_students.LastName,de.CourseName,de.Mark
FROM cte_students AS all_students
INNER JOIN cte_students de ON DE.StudentID = all_students.StudentID AND de.Mark = all_students.Mark
WHERE de.CourseName = ‘Data Enginering’ and de.Mark > 95

3. Subquery can be used in a “where” statement but a CTE can not. Take for example subquery to return the data engineering students with mark > 95

SELECT *
FROM Student
WHERE StudentID IN( SELECT [StudentID]
FROM [dbo].[Student_Grade] g
INNER JOIN [dbo].[Course] c ON c.CourseID = g.CourseID
WHERE g.Mark > 95 and c.CourseName = ‘Data Enginering’)

3. Subquery supports correlation but CTE does not. lets extract students from all courses whose mark is greater than the average for the entire class.

SELECT *
FROM Student s
INNER JOIN Student_Grade g ON g.StudentID = s.StudentID
INNER JOIN Course c ON c.CourseID = g.CourseID
WHERE g.Mark > ( SELECT AVG(g.Mark)
FROM [dbo].[Student_Grade] g
WHERE g.CourseID = c.CourseID)

The output above lists only students with mark greater than the average mark for the course. The correlation part is where the outer courseID is passed to inner query to calculate

the average for that course and check that rows mark against what comes of the inner query. The row is returned if it passed the check. In correlated query, each row is processed individually.

 

4. CTE supports recursion but a subquery does not. Check the tutorial l referenced above to learn more about recursion, but lets check how to use CTE recursion for the Fibonacci sequence.

The Fibonacci Sequence is the series of numbers: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ……..

The next number is found by adding up the two numbers before it. so to get 2 you add the two numbers before it (1+1), 3 is found by adding the two numbers before it (1+2) and so on.

Below is the recursive CTE to get the series

;WITH RECURSIVE_fibonacci(prev_1, prev_2)
AS
(SELECT 0, 1
UNION ALL
SELECT prev_2, (prev_1+prev_2)
FROM RECURSIVE_fibonacci
)
SELECT TOP 10 prev_1 AS Series
FROM RECURSIVE_fibonacci ;

To conclude, l do not see performance difference in using a CTE vs Subquery. It is also worthwhile to mention some versions of mysql does not support CTE so if code portability is an issue for the application you are building then you have to consider the limitation, but in general CTE use is highly recommended over subquery mostly for readability.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *