A-Z of In-Memory OLTP : Hash Indexes (Part 1)
Posted by blakhani on January 9, 2014
Let’s imagine you came to meet my friend Vinod (b|t) and asked him to find “Readable Secondary” word in our book on AlwaysOn of 420 pages. How would he do it? Do you think he would start reading each and every page to find that word? No! He knows that there is a Index page at the end of the book which will tell him the word and the page number where its located. Our search becomes easier with the help of Index, correct? In the same way, SQL Server also uses indexes to find a row which you are asking to select/update/delete.
In my previous blog about T-SQL constructs, we have discussed that index on in-memory need to be created during table creation itself. Alter of in-memory table is not allowed. Here are the few point we should be aware.
- Indexes on in-memory tables reside only in memory. Only index definition is stored on disk and index structure is created while starting of the database. Refer this blog.
- At least one index is needed. For SCHEMA_AND_DATA durability a primary key needs to be created.
Msg 41321, Level 16, State 7, Line 21
The memory optimized table ‘TableName’ with DURABILITY=SCHEMA_AND_DATA must have a primary key.
Msg 1750, Level 16, State 0, Line 21
Could not create constraint or index. See previous errors.
- If we use SCHEMA_ONLY as durability then here is the error message if there is no index declared.
Msg 41327, Level 16, State 7, Line 21
The memory optimized table ‘TableName’ must have at least one index or a primary key.
Msg 1750, Level 16, State 0, Line 21
Could not create constraint or index. See previous errors.
- There is no concept of clustered index on in-memory table. Only non-clustered indexes are supported.
- Total 8 Indexes are supported on in-memory tables. Exceeding this would cause below error message
Msg 1910, Level 16, State 1, Line 14
Could not create in-memory index ‘IndexName’ because it exceeds the maximum of 8 allowed per table or view.
Msg 1750, Level 16, State 0, Line 14
Could not create constraint or index. See previous errors.
- String column on which index is created must use *_BIN2 collation else we would encounter below error.
Msg 12328, Level 16, State 106, Line 14
Indexes on character columns that do not use a *_BIN2 collation are not supported with indexes on memory optimized tables.
Msg 1750, Level 16, State 0, Line 14
Could not create constraint or index. See previous errors.
Here is the sample failure and the solution for error 12328:
CREATE TABLE Failure ( [iID] [int] NOT NULL, [vName] [nvarchar](20) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 1024), )WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA ) GO /* Msg 12328, Level 16, State 106, Line 14 Indexes on character columns that do not use a *_BIN2 collation are not supported with indexes on memory optimized tables. Msg 1750, Level 16, State 0, Line 14 Could not create constraint or index. See previous errors. */ CREATE TABLE Success ( [iID] [int] NOT NULL, [vName] [nvarchar](20) COLLATE Latin1_General_100_BIN2 NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 1024), )WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA ) GO /* Command(s) completed successfully. */
Another option would be to specify the collation during database creation itself. Using this, we need not specify different collation during table creation as column’s default collation would be database collation (highlighted in below script)
CREATE DATABASE SQLSeverHelp_IMO ON PRIMARY ( NAME = [SQLSeverHelp_IMO_data] ,FILENAME = 'C:\IMO_Database\SQLSeverHelp_IMO.mdf' ) ,FILEGROUP [SQLSeverHelp_IMO_FG] CONTAINS MEMORY_OPTIMIZED_DATA ( NAME = [SQLSeverHelp_IMO_dir] ,FILENAME = 'C:\IMO_Database\SQLSeverHelp_IMO_dir' ) LOG ON ( NAME = [SQLSeverHelp_IMO_log] ,Filename = 'C:\IMO_Database\SQLSeverHelp_IMO_log.ldf' ) COLLATE Latin1_General_100_BIN2 GO use SQLSeverHelp_IMO go CREATE TABLE Failure( [iID] [int] NOT NULL, [vName] [nvarchar](20) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 1000), )WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA ) GO
Now, let’s understand the type of indexes which can be created on in-memory tables. There are two kind of Indexes, Hash Indexes and Range Indexes. In this blog, I would explain hash indexes. Since there is a lot of information, I will continue about hash indexes in next post as well.
At this point, it is important that you know the row structure of in-memory tables. I have already blogged about it in my previous post here.
HASH INDEX
Hash index is a list of pointers and each of them is called as bucket. (Wikipedia) The number of buckets needs to be specified while creating the index. It is important to note that the number of hash buckets can’t be changed once table is created. There is no alter operation possible so we would end up in dropping and recreating the table for re-declaring the bucket count. Let’s assume for simplicity that I have created index with 4 buckets. I will start with very simple example with 4 has buckets with three rows and only one hash index.
For simplicity purpose, I have assumed the hash function as length of the column and it’s created on FName column (Note that the real hash function used in SQL is much more random and unpredictable, but to illustrate I am using Length function) Len(Vinod) = 5 so it would fall into first bucket and there would be a pointer which is pointing to actual row (middle row in image). In the same way (length), PinalD would be in hash bucket 2 (hashed values = 6) and Balmukund would be in hash bucket 3 (hashed values = 9)
Now, let’s assume that I insert another value SQLServer into Name column. Which bucket it would be? Since len(SQLServer) = 9 it would fall into bucket # 3 but we already have a row sitting there. This is the situation which is called as “Hash Collision” In this case, SQL Server would add a pointer in the existing row to point to next row as shown below.
Now imagine a situation that we have many rows (may be 10000) and have less buckets (only 4) then there would be a lots of hash collisions and length of chain would keep on increasing for each bucket. This would cause performance degradation while search for data in a bucket. Let’s understand a small concept about BUCKET_COUNT parameter in SQL Server.
BUCKET_COUNT
Number of bucket for a hash index is defined using BUCKET_COUNT parameter during index creation. SQL Server creates number of hash buckets as always power of 2. If we declare number of bucket which is not a power of 2 then it would be automatically rounded-off to next number which is exact power of 2. In my example above, I have given BUCKET_COUNT as 1000. Doing a little math, we can easily find that next valid value would be 1024 (2^10). Let’s have a look at demo to understand this better.
In below script, I would create 65 tables with there name as MemporyOptimizedTable<n> where n is a integer between 1 to 65 and also denotes number of buckets for each table. (MemporyOptimizedTable1 has 1 bucket, MemporyOptimizedTable2 has 2 buckets and so on…)
-- use the database which is already created use SQLSeverHelp_IMO go -- declare the looping variables declare @loopCounter int , @CreateString nvarchar(4000) -- initialize Select @loopCounter = 1, @CreateString = '' while @loopCounter <= 65 -- loop 65 times begin -- build dynamic string with table name and bucket count having loop counter set @CreateString = ' CREATE TABLE MemporyOptimizedTable'+ CONVERT(varchar(10), @loopCounter)+' ( iID INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = '+CONVERT(varchar(10), @loopCounter) +'), vFName VARCHAR(20) NOT NULL, vLName VARCHAR(20) NOT NULL ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)' -- execute dynamic query Exec (@CreateString) -- increment the counter set @loopCounter = @loopCounter+1 end
Now, let’s look at the bucket_count which SQL has created using catalog view sys.hash_indexes
Select object_name( object_id) 'Table Name', bucket_count 'Number of Buckets' from sys.hash_indexes order by 2, 1 asc
As we can see above that even if we have declared Bucket_count as 9, SQL Server has rounded off to 16. Here is the simple script which can be used to find the number of buckets for a given input value.
DECLARE @inputNumber int = 65 ; -- BUCKET_COUNT during index creation DECLARE @BucketsCreatedBySQL int = 0; -- Check of @inputNumber is power of 2. if yes, that's the bucket number IF (LOG(@inputNumber,2) = ROUND(LOG(@inputNumber,2),0)) SELECT @BucketsCreatedBySQL = @inputNumber; ELSE -- If not power of 2, round off to next power of 2 SELECT @BucketsCreatedBySQL = POWER( 2, floor(LOG(@inputNumber,2))+1); SELECT @BucketsCreatedBySQL [BucketsCreatedBySQL]
This blog has become longer than I expected.. There is a lot more to cover about hash index I would continue discussion about hash indexes in next blog.
noel said
nice read , thanks Balu
Noel
blakhani said
Thanks @Noel
A-Z of In-Memory OLTP : Hash Indexes (Part 2) « Help: SQL Server said
[…] « A-Z of In-Memory OLTP : Hash Indexes (Part 1) […]
A-Z of In-Memory OLTP : Memory Optimization Advisor « Help: SQL Server said
[…] about index type and bucket count, in case of hash index. You can read more about hash indexes (part 1 and part […]
A-Z of In-Memory OLTP : Garbage Collection (Part 1) « Help: SQL Server said
[…] understand below picture, I would like you to go to hash indexes (part 1 and part 2). I have done color coding to make it easy to […]