We all wants our surroundings to be clean. Sometimes we put efforts to keep it clean and some times we rely on someone else to clean keep it clean. In short, none of us like garbage around us. Garbage is something which has been useful in the past but no more useful. On the same lines, computer world also has term garbage (Wikipedia) and to clean it up, there is a concept called Garbage Collection (Wikipedia). Do you know that in earlier version of SQL there was something similar? Yes, it was “ghost cleanup” back ground process. In SQL Server 2014, garbage collection is “similar” work for data in in-memory tables.
There are two kind of Garbage Collection which exists in In-Memory OLTP:
- Garbage Collection of Rows in Memory
- Garbage Collection of Checkpoint Files
Now you might understand why I called this post as part 1. Yes, we would start from the clean-up of rows in memory.
In earlier post, we have discussed about row structure and timestamp. If a row is deleted or updated (which is delete + insert internally) we would have different row versions. It would not be a good idea to delete them physically at the same time. Due to this we would have a number of discarded (garbage) rows in memory. To handle such discarded row versions, garbage collection process is introduced. This process periodically cleans up the rows, which are “garbage” at this point.
In above we have end timestamp available for row 1 (Balmukund, Microsoft) and row 3 (PinalD, PluralSight). Notice First and forth row – End Timestamp for Row 1 is Begin Timestamp for Row 4. This would mean that we have fired “Update” and Modified Balmukund to SQLServer at timestamp 300. On the other hand, row 3 was deleted from the system via “Delete” statement at timestamp 170. As indicated in picture, the lowest active transaction time stamp is 350, which means that there is no “active” transaction in the SQL Server whose timestamp is less than 350. Recall that the life of a row is determined by begin and end timestamp. Since all the transactions are above timestamp 350, there is no transaction which would “need” those rows. This would mean that rows 1 and 3 are eligible for the garbage collection. These rows can be removed and their memory can be released back to the system.
The garbage collection system is designed to be non-blocking, cooperative, efficient, responsive and scalable. (taken from Hekaton whitepaper). Let’s me try to explain reason of those words:
- Non-Blocking: In some systems, when garbage collection activity starts, there is a huge work done and other “real” workload becomes to halt. This is NOT true with SQL Server In-Memory OLTP. It happens in the back ground all the time.
- Cooperative: In SQL Server In-Memory OLTP, garbage collection is not the responsibility of dedicated system thread alone. All other users threads who are doing reading/writing of data also do a little piece of over all garbage collection work.
- Efficient: A regular reader thread would do work if it find a row which is stale/garbage.
- Responsive: Garbage collection process responds to memory pressure. Recall, lazywriter thread in traditional SQL world.
- Scalable: Work of Garbage collection if shared. Any transaction which is completed would look at the queue and would de-queue the work.
All above points would be clear if you read further.
Let’s assume that there is a select statement fired which is using hash Index created on FName column (for example select * from table where FName = ‘SQLServer’). Since hash value of column is 9, it would scan all rows of the bucket. While doing that it would find a row (Balmukund, Microsoft) which is stale (Refer Image#1). The reader thread would delink that row from the chain and move forward. Note that the row is still NOT deleted from memory but a little work of garbage collection has been done. Here is the after image after select query.
You might say that what about red chain above (bucket 9)? That would be cleaned by regular garbage collection thread if user thread is not doing it. This means that if we traverse using Index on company and scanning bucket of hash value 9, it would be delinked by user thread else system thread would do it. Below would be the final state of the system once all garbage is cleaned and memory is released.
This means that garbage collection is not a discrete process in SQL Server In-Memory OLTP. It can be done by dedicated system thread and also cooperatively done by user threads. This make the garbage collection more efficient.
Hope you have learned something new today! In next blog of the series, we would see the DMVs which can be used to see garbage collection in action.