Hello world, welcome to another important tutorial on database programming using TSQL. Today, we are going to talk about Partition. This is a very important concept and the main purpose of partitioning is to either speed up the performance of your operations or the ability to direct data to be stored on different storage locations or filegroups.
Before we go into the concept, lets take a look at how the file system is arranged on a regular windows system. From a top level, you have the hard drive which contains all files and folders for the operating system. The next in line after the hard drive is a folder. Folders can contain subfolders and subfolders can also contain subfolders. After the folder, we have the file itself. File is the container that actually holds the data you are storing. It can be text file, pdf, excel or whatever it is. That’s how the windows operating system manages it’s file system. Do you ever wonder why there are folders and not just files all over? Well, it’s quite simple to understand, arranging the files into folders help the operating system to find the files quicker, and there is a whole concept dedicated to how the operating system actually looks and find data when the user requests for it but this tutorial is not going into all that crap right now. Database engine employ the same concept, the highest level of storage is the filegroup, then the files. The objects you create in the database are stored in the files. Then where does a partition comes in? Well, it’s simple, a partition will divide the data in the file based on some criteria and store it into separate partitions. So for example, l have a table which keeps record of all humans on earth. That’s a big ass table, right? lol. If l’m looking for someone from this table, you know it’s going to take some time to get to that person, but with partition you can cut the job down a lot. What the partition does is that even thought it’s the same table, it’s going to store parts of the table on separate partitions. For example, if l have created 3 partitions, l can store the same data that belongs to the same table across this 3 partitions. So that if l want to retrieve a record, the SQL engine is smart enough to go straight to a specific partition that contains the record l wanted and get it for me. That’s pretty awesome and it makes life for the server and yourself easy lol. Okay, so how are we going to get this partition done?
Easy stuff, first thing to get done is to create a partition function. What the heck am I talking about? A partition function is just a simple function that determines the way you want to group the data on the partition. Let’s say, you have an ID column for all the humans on earth and you want to group the records on different partitions based on this ID column. For example, humans with ID 1 to 100 will be on partition 1, humans with ID 101 t0 200 will be on partition 2, and humans with ID greater than 200 will be on partition 3.
This idea of grouping of the humans based on the ID column is going to be done by the partition function. The partition function is very simple so don’t get freaked out. It’s easier done you can ever think off. This is how you create it
CREATE PARTITION FUNCTION <functionname>(datatype)
AS <range left/range right>
FOR VALUES(v1,v2,v3,-….)
Looking at the syntax above, you give the function a name and specify the data type to be supplied to the function. Range left or range right just tells how you want the division to be done. So lets say we have the values for the records like this,
1———–100———–200
Range left means, you include the end point to the left part of the division. so in the above, the first division will consist of values 1 to 100 inclusive, the next division will include values from 101 to 200 inclusive and the third division will include 201 and greater. When you specify range right, that means the end point belongs to the partition on the right. For example, in our example, the first division will be 1 to 99, 100 to 199, 200 to infinity.
The “FOR VALUES” specify the end points. so for example, our above division will have the for part like this
FOR VALUES(100,200). Now lets see the whole partition function created for this division of human record into different partitions
CREATE PARTITION FUNCTION p_Function_humanDivisor (INT)
AS RANGE LEFT
FOR VALUES(100,200)
The next thing after creating the partition function is the partition scheme. The partition scheme uses the partition function and map the divisions from the function to physical locations on the storage media. So we see the function is only telling us how to divide, something has to actually map each division to a physical location on the drive. That’s the work of the partition scheme. Lets see the syntax for the partition scheme.
CREATE PARTITION SCHEME <partition scheme name>
AS PARTITION <functionname> TO (location1, location2,……)
The scheme has to refer to the partition function and then map it to the physical locations. So how do we supply the locations? Easy money here too. Remember l talked about file groups in the beginning?, that’s the trick. You will create file groups and then supply it as the location.
For example, l’ll create two filegroups and call it fg1 and fg2. then l’ll give it as the location in the partition scheme. You can create filgegroup using a script or right click on the database in the management studio, go to properties and click filegroup in the left pane. Then click add button below the window showing the filegroups in the system. give it a name and click ok. To use a script, you do it like this.
ALTER DATABASE <databasename> ADD FILEGROUP <filegroupname>
So let me create 3 filegroups to serve as the partitions for my human table. Lets say, the human table is in a database called WorldRegister.
ALTER DATABASE [WorldRegister] ADD FILEGROUP [fg1]
ALTER DATABASE [WorldRegister] ADD FILEGROUP [fg2]
ALTER DATABASE [WorldRegister] ADD FILEGROUP [fg3]
If there are filegroups with the same name, it will throw an error.
Now l have my file groups, l’m now going to map it using the partition scheme and the function.
CREATE PARTITION SCHEME P_Scheme_humanDivisor
AS PARTITION p_Function_humanDivisor TO (fg1, fg2,fg3)
That is it, l have my partition done, all l have to do now is to create the table on this partition or modify the table if it already exists to use this partition. In a database you can partition a table, index and indexed view. Again, partition lets you split a table/index/indexed view across multiple storage units called file groups, based on a user specification. So, allows better manageability. For example, l have partition the human table based on the ID column. l can also partition based on the age or anything l want to use. Partition improves the scalability and manageability of large tables.The partition logic depends on how you want to group records in the table. One realistic partition for our human table will be to partition our table based on the country code so it will help us store information about people in separate partitions dedicated to their country, even though its the same table, the data is stored across multiple units. The filegoups can be on different hard drives so it makes it very good and fast, think of having multiple processors working paralleling on different partitions to retrieve your data. That’s awesome. Are you happy now? lol. Remember, the partition function and the partition scheme must reside in the database that contains the table you are partitioning.
Assume we don’t have our human table already, lets create it now to take advantage of the partition we created using the file group.
USE [WorldRegister]
GO
CREATE TABLE Human (ID INT,FirstName, LastName, CountryCode)
ON P_Scheme_humanDivisor (ID)
How does it work all together? This it how it works. The table is created on the partition scheme and we gave the partition scheme the column ID from our table, the partition scheme then knows what column to supply the partition function to determine where a record will be placed. Lets try to enter some people into the table and see how it will work.
USE [WorldRegister]
GO
INSERT INTO Human
VALUES(1,’Barack’,’Obama’,1),(2,’Bill’,’Clinton’,1),(150,’JJ’,’Rawlings’,233),(101,’Vincent’,’Cosby’,233),(205,’Kofi’,’Amedekah’,222)
So how is this data going to be stored in the database? The first and second records have IDs in the range 1 to 100 so they will be stored in fg1 and third record will be stored in fg2 because the ID is in the range 101 to 200, finally the last record will be stored in fg3 because it has an ID greater than 200.
Now what? Is that the only way to create a partition for a table? The answer is definitely no. You can use a graphical wizard in management studio to create a partition for a table just in a minute. All you have to do is create the file groups, then go to the particular table you want to partition, right click the table and go to storage. Select create partition, a wizard will open up telling you how the partition will be created, click next, select the column in the table based on which you want to create the partition and click next and give a name to the partition function the wizard will create and click next, Give a name to the partition scheme the wizard will create and click next. Then select the type of boundary to use(left or right). map the file groups to each partition by selecting the drop down arrow under the file group column, give the number for the endpoint. select all file groups and give the values. Then click next to create the partition immediate or just save it as a script so you can run it later.
If you create partition on a clustered index, each partition will have it’s Balance tree with data stored at the leaf nodes. So first, the is smaller space for the server to search and in addition, the index help the server to do an index seek or index scan to get the data. Makes life so cool for the server.
You can also retrieve all records just from a particular partition like this,
SELECT * FROM Human
WHERE $PARTION.P_function_humanDivisor(ID) = 3
When you create the partition, the system gives id to those partitions(filegroup) so you can use it to retrieve all record from that partition pertaining to a particular table. Don’t get it wrong, you can store multiple files in the same partition. It doesn’t mean each table will have just one partition, you can store part of different tables in the same partition. As a summary,Steps to create partition:
1. create a partition function
2. create a partition scheme mapped to a partition function
3. create the table or index on the partition scheme.
What is a Partition function?
A function in the current database that maps the rows of a table or index into partitions based on values of a specified column. It defines boundaries of the partitions.
What is a Partition scheme?
A scheme in the current database that defines the filegroups or storage structures used for partitions of a table or index. Partition scheme is mapped to a partition function., so the boundaries and number of filegroups to be used are specified in the partition function in the beginning. Before creating a partition scheme, make sure the filegroup you are using exists and is mapped to a data file (mdf/ndf). That’s all l have on partitions. Questions, Feedback and corrections welcome. The next topic l’ll be discussing is disaster recovery. That consists of backing up and recovering the database. Peace.