A-Z of In-Memory OLTP : Storage structure of rows
Posted by blakhani on January 7, 2014
In our earlier posts we have learned about various concepts. In this post, we would understand how the data is actually stored in the row. This would form a foundation to understand indexes created on in-memory table. There are major difference in storage format of disk based table vs. in-memory table. Here are few of them.
- There is no 8 KB page concept for in-memory table and so there is no concept of extent.
- Rows belonging to same table need not be in “near” other rows.
- The rows are linked to each other via index pointers and that’s why we need at least one index on in-memory table.
- No concept of clustered index because there is no sorting of data rows.
- Theoretically rows are free flowing and there is not limit on size of row BUT since we are allowed to move the data to disk-based table (using insert), there is a limit of row size to 8060 bytes.
- Every data type is NOT allowed for columns (like varchar(max), LOB, sqlvariant etc.)
Here is the pictorial representation of a row:
Row Data is the actual “data” of the row, i.e. the values which we insert into table columns. On the other hand, Row Header is information about row itself. Row headed consists of many fields. Begin TimeStamp is a field which contains the transaction timestamp when row was inserted. End TimeStamp is a field which contains the transaction timestamp when row was deleted. Each Timestamp field is 8 bytes (16 total). This means that these two values are used to identify “visibility” of a row for a transaction. You might ask what would be the value of End Timestamp when a row is inserted? Well, it’s a special value which is referred at infinity. In my earlier post, we have learned that UPDATE = DELETE + INSERT. This means that if any update happens, we “close” the existing row by putting end timestamp and “open” a new row by inserting new row with new begin timestamp and end timestamp as infinity. Here is the example.
After the update transaction, the visibility of first row is only for those transactions which have timestamp between 100 and 200.
Another field in Row Header is Statement ID which is unique for each statement. This is 4 bytes filed. This field stores the ID of the statement which has created the row.
Next field is Index Link Count which specifies the number on indexes the row has. And final field in the row header is the pointer to the next row for the index. Since each pointer is 8 bytes so the length of this field depends on number of Index on row. If we have 4 indexes, the length of this field would be 4*8 = 32 bytes.
Once you are clear with row storage, it would be easy to understand about special indexes which are used for in-memory tables. In next blog posts we would discuss about hash indexes and range indexes.