A-Z of In-Memory OLTP : Hash Indexes (Part 2)

January 2014
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Posted by blakhani on January 14, 2014

Imagine that you are standing in front of a multistoried apartment and don’t know the apartment where you friend resides (find 1 out of 100). It would be tough to knock each and every door to check if the house belongs to your friend (perform a Scan). What if there is a directory (Index) with the security guard having name and apartment number? Easy? Each of the apartment is a row and you know the predicate (where clause). Indexes in any technology (and real life as well) are used for faster access to the data.

In the previous post of the A-Z Series, we have seen how a single hash index looks like and learned about hash collision. In this post let’s go one step further understand how two hash indexes on the same table would look like. To make picture more intuitive, I have added color to the column matching with the bucket they belong to. Same as earlier, I am using LEN function as hash function.

Along with earlier index on fName column, we now have another on Company column. If we try to compare above image with single hash index image in previous blog, it’s clear that now we have “Pointer 2 added” in the row header area of the row (read here). As we can see above, we have three rows falling into same bucket for company column. The bucket of hash index would point to first row of hash bucket <Balmukund, Microsoft>. Due to collision on company Microsoft, there would be chain pointers in the existing row to point to next row of the same bucket (red color arrows in above image)

In above picture, we have assumed that there is no deletion happened and that’s why we see ∞ (infinity) in End Timestamp for all four rows. This means that all rows are valid for timestamp 300 onwards (begin timestamp is max 300). If delete or update (=delete + insert) happens for a row then as described in earlier blog, the timestamp would be closed for deleted data and new row would be created with new begin timestamp. Lets assume that we fired a update statement to modify Balmukund’s company to Ramco at time stamp 350. We would put end timestamp as 350 for <Balmukund, Microsoft> row and insert new row with <Balmukund , Ramco>. All pointers need modification. Since LEN(Ramco) = 5 and there is no hash collision, new pointer is added.

Later when garbage collection happens, first row <Balmukund, Microsoft> would be removed and pointer would be modified.

You may ask – Is there any way to find how many hash buckets we have and do we have collision? Yes, of course! SQL Server has DMV dm_db_xtp_hash_index_stats available to help us investigate. Let’s use the script to understand this concept.

-- Create database with IMO Filegroup, If exists drop it.
Use Master
go
If db_id('SQLSeverHelp_IMO') is NOT NULL
drop database SQLSeverHelp_IMO
GO
CREATE DATABASE SQLSeverHelp_IMO 
    ON PRIMARY (
    NAME = [SQLSeverHelp_IMO_data]
    ,FILENAME = 'C:\IMO_Database\SQLSeverHelp_IMO.mdf'
    )
    
    ,FILEGROUP [SQLSeverHelp_IMO_FG] CONTAINS MEMORY_OPTIMIZED_DATA ( 
    NAME = [SQLSeverHelp_IMO_dir]
    ,FILENAME = 'C:\IMO_Database\SQLSeverHelp_IMO_dir'
    ) 
    
    LOG ON (
    NAME = [SQLSeverHelp_IMO_log]
    ,Filename = 'C:\IMO_Database\SQLSeverHelp_IMO_log.ldf'
    ) 
    COLLATE Latin1_General_100_BIN2 
GO
-- Create table in database
-- use the database which is already created
Use SQLSeverHelp_IMO
GO
CREATE TABLE MyFirstMemporyOptimizedTable
(
    iID INT NOT NULL,
    vFName VARCHAR(20) NOT NULL,
    vLName VARCHAR(20) NOT NULL,
    CONSTRAINT imo_pk_iID primary key NONCLUSTERED HASH (iID) WITH (BUCKET_COUNT = 200),
    index imo_idx_vFname  NONCLUSTERED HASH (vFName) WITH (BUCKET_COUNT = 200)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
GO

I have given bucket count as 200. can you guess how much bucket SQL Sever is going to create? If you can’t answer then you have not read part 1 of hash indexes, which I mentioned in beginning. SQL is going to put 256 buckets for both indexes. Let’s verify!

Select  name 'Index Name', 
        object_name( object_id) 'Table Name',
        bucket_count 'Number of Buckets'
from    sys.hash_indexes
order by 2, 1 asc
Index Name        Table Name                     Number of Buckets

—————– —————————— —————–

imo_idx_vFname    MyFirstMemporyOptimizedTable   256

imo_pk_iID        MyFirstMemporyOptimizedTable   256

(2 row(s) affected)

Let’s insert 90000 rows in the table using natively complied procedure as below.

Use SQLSeverHelp_IMO 
GO
CREATE PROCEDURE [dbo].[InsertName] 
  WITH 
    NATIVE_COMPILATION, 
    SCHEMABINDING, 
    EXECUTE AS OWNER
AS 
BEGIN ATOMIC 
  WITH 
(TRANSACTION ISOLATION LEVEL = SNAPSHOT,
 LANGUAGE = 'us_english')
 
DECLARE @counter INT 
SET @counter = 1

WHILE @counter <= 90000 
  BEGIN 
      INSERT INTO dbo.MyFirstMemporyOptimizedTable 
      VALUES      (@counter, 
                   'SQLServer-Help.com', 
                   'Balmukund Lakhani');
      INSERT INTO dbo.MyFirstMemporyOptimizedTable 
      VALUES      (@counter + 1, 
                   'ExtremeExperts.com', 
                   'Vinod Kumar M');
      INSERT INTO dbo.MyFirstMemporyOptimizedTable 
      VALUES      (@counter + 2, 
                   'Blogs.msdn.com', 
                   'Balmukund Lakhani') 

      SET @counter = @counter + 3 
  END
END
GO

exec InsertName
go

Now, let’s examine hash index statistics.

SELECT Object_name(his.object_id)        'Table Name', 
       idx.name                            'Index Name', 
       total_bucket_count                'total buckets', 
       empty_bucket_count                'empty buckets',            
       total_bucket_count - empty_bucket_count as 'used buckets', 
       avg_chain_length                    'avg chain length', 
       max_chain_length                    'max chain length', 
       90000                                   as 'Rows - hardcoded value' 
FROM   sys.dm_db_xtp_hash_index_stats as his 
       JOIN sys.indexes as idx 
         ON his.object_id = idx.object_id 
            AND his.index_id = idx.index_id;   
Table Name                      Index Name       total buckets empty buckets used buckets avg chain length max chain length Rows – hardcoded value

——————————- —————- ————– ————– ————- —————– ————— ———————-

MyFirstMemporyOptimizedTable    imo_idx_vFname   256            253            3             30000             30000            90000

MyFirstMemporyOptimizedTable    imo_pk_iID       256            0              256           351               355              90000

(2 row(s) affected)

Chain length in conjunction with buckets in the output tells that for index imo_idx_vFname, we have only three bucket and each bucket has 30000 entries. If we go back and examine the stored procedure, we are inserting only three values in loop of 30000 for vFName column. Whereas for the other index imo_pk_iID, we don’t have any free bucket and chain length is more for each bucket. This is the right candidate for more number of buckets. Typical value of bucket count is between 1 to 2 times of distinct values on that column. Remember that bucket count can’t be change on the fly – whole table needs to be dropped and recreated.

Books online references:sys.dm_db_xtp_hash_index_stats and Determining the Correct Bucket Count for Hash Indexes

Hope you have already downloaded SQL 2014 CTP2 and following this series with me!

Cheers,

Balmukund Lakhani

Twitter @blakhani

Author: SQL Server 2012 AlwaysOn – Paperback, Kindle

This entry was posted on January 14, 2014 at 3:30 AM and is filed under A - Z Series, Hekaton Series, In Memory OLTP, In-Memory OLTP, SQL Server 2014. Tagged: A-Z-Series, bucket_count, database, hash index, Hekaton, Hekaton Series, HekatonSeries, In-Memory OLTP, inmemory, inmemory oltp, learning, row, SQL, SQL Server 2014, storage, structure. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

7 Responses to “A-Z of In-Memory OLTP : Hash Indexes (Part 2)”

A-Z of In-Memory OLTP : Memory Optimization Advisor « Help: SQL Server said

January 30, 2014 at 3:32 AM
[…] Once we hit next, we need to provide details about Primary Key (which is mandatory). We also need to provide details about index type and bucket count, in case of hash index. You can read more about hash indexes (part 1 and part 2) […]

Reply
A-Z of In-Memory OLTP : Garbage Collection (Part 1) « Help: SQL Server said

February 7, 2014 at 3:31 AM
[…] understand below picture, I would like you to go to hash indexes (part 1 and part 2). I have done color coding to make it easy to […]

Reply
Abhay Chaudhary said

January 13, 2015 at 10:46 PM
Balu Sir,
In this case what would you suggest the optimal setting for the bucket could for a primary key (NCL) because every row would be unique.

Regards
Abhay

Reply
Anthony Pedone said

February 27, 2015 at 6:14 PM
Hi .. just wanted to say, “Great Job with the Series”

Reply
- blakhani said
  
  February 28, 2015 at 1:44 PM
  Thanks Anthony!
  
  Reply
Basava said

February 29, 2016 at 2:30 PM
Hi Balmukund, If my inmemory table is more than the physical RAM, then how does it work? like pagination etc.

Reply
- blakhani said
  
  March 19, 2016 at 12:34 AM
  I don’t think so.
  
  Reply

Help: SQL Server

Sharing my knowlege about SQL Server Troubleshooting Skills

Blog Stats

Select GETDATE()

Subscribe

Follow Blog via Email

Date < getdate()

Select location from world

Meta