[EN] Column order - does it matter?
Do you think column order doesn't matter in SQL Server tables? Well, if you say it doesn't I would say the same thing as always: "it depends" :-)
On SQL Server 2005 instance let's create some table with 50 nullable varchar columns:
USE tempdb
GO
IF OBJECT_ID('tempdb.dbo.test_varchar') IS NOT NULL
DROP TABLE dbo.test_varchar
CREATE TABLE dbo.test_varchar
(
id INT PRIMARY KEY IDENTITY(1,1),
col0 VARCHAR(100) NULL,
col1 VARCHAR(100) NULL,
col2 VARCHAR(100) NULL,
...
-- put all other varchar columns here
...
col49 VARCHAR(100) NULL
)
GO
Now, let's insert some rows:
INSERT INTO dbo.test_varchar(col0)
SELECT 'a'
GO 100000
Notice that only the first varchar column is filled with not null value. Let's see how big the table is now:
EXEC sp_spaceused N'dbo.test_varchar'
The results:
name rows reserved data index_size unused
------------ ------ -------- -------- ---------- ------
test_varchar 100000 2440 KB 2376 KB 16 KB 48 KB
Now, let's delete all rows and fill the table again:
TRUNCATE TABLE dbo.test_varchar
GO
INSERT INTO dbo.test_varchar(col49)
SELECT 'a'
GO 100000
This time only the last varchar collumn is not null. Again, let's check the size of the table. The results:
name rows reserved data index_size unused
------------ ------ -------- -------- ---------- ------
test_varchar 100000 12232 KB 12128 KB 56 KB 48 KB
The size is almost five times larger than when the first column was not null! What's going on?
To show the thing I will use some very nice tool - SQL Internals Viewer (of course, I can use DBCC IND and DBCC PAGE commands instead). First I will show the structure of a single row with the first varchar column not null.

This is a common row structure in SQL Server. Nothing special - you can see some NULL bitmap there, some column offset array and so forth. What's the most important you can calculate the size of the row - 22 bytes.
Now I will do the same with the row having only the last varchar column not null.

This time the size of the row is 120 bytes. And if you take a look at the numbers above you should see that 100 bytes is spent on the offset array. So it seems SQL Server stores the offsets for all columns up to the last not null one. That means if you have a table with some number of variable-length columns that will remain null in most cases it can be better to put them at the very end of the column list in a table. This can seem to be not a real life example (some people could say that we should avoid nullable columns) but I think we should consider this behavior when designing our tables (less space means less pages, less pages means less reads, less reads means faster queries).
Now, let's see if there is some answer for this problem in SQL Server 2008 CTP6. I will use sparse columns instead of just nullable ones.
CREATE TABLE dbo.test_varchar
(
id INT PRIMARY KEY IDENTITY(1,1),
col0 VARCHAR(100) SPARSE NULL,
col1 VARCHAR(100) SPARSE NULL,
col2 VARCHAR(100) SPARSE NULL,
...
-- put all other varchar columns here
...
col49 VARCHAR(100) SPARSE NULL
)
GO
Again, I will fill the table like before (only the last column will have not null value).
INSERT INTO dbo.test_varchar(col49)
SELECT 'a'
GO 100000
Let's check the size of the table. The results:
name rows reserved data index_size unused
------------ ------ -------- -------- ---------- ------
test_varchar 100000 2632 KB 2576 KB 16 KB 40 KB
Wow, I would say. I'm not going to claim that sparse columns are already my favorite feature in Katmai but they seem to deserve a closer look at (what about the performance of inserts and so forth) :-)
Oh, and by the way - the size of the table in SQL Server 2008 is the same if you fill only the first column with not null value.
Credits go to my coleague from Asseco Business Solutions S.A., Jarek Kuśmierek, who has made most of the research shown above.