Many times you might have to update the statistics (index stats or normal stats) with fullscan, either manually or via scheduled job. If the table is large then time taken would be considerably huge because it has to scan each and every record of the table. Let’s assume that we have created partitions on the table and we only modified data for one partition. Earlier version of SQL didn’t offer any choice to update statistics only for one partition. Only choice we had was “FULLSCAN”.
To overcome this, SQL Server 2014 introduced “INCREMENTAL” keyword under Create Statistics. As per books online “When ON, the statistics created are per partition statistics. When OFF, stats are combined for all partitions. The default is OFF.”
To have some sample data, I have downloaded base AdventureWorks sample database from: http://msftdbprodsamples.codeplex.com/downloads/get/417885
I have restored it in SQL Server 2014 and named as AdventureWorks2014. I would be playing around with [AdventureWorks2014].[Production].[TransactionHistoryArchive] table which has transactions from 2005-05-17 to 2007-08-31. I have created quarterly partition.
Use AdventureWorks2014; GO CREATE partition FUNCTION [QuarterlyDate](datetime) AS range LEFT FOR VALUES ( N'2005-05-01T00:00:00', N'2005-08-01T00:00:00', N'2005-11-01T00:00:00', N'2006-02-01T00:00:00', N'2006-05-01T00:00:00', N'2006-08-01T00:00:00', N'2006-11-01T00:00:00', N'2007-02-01T00:00:00', N'2007-05-01T00:00:00', N'2007-08-01T00:00:00', N'2007-11-01T00:00:00'); GO CREATE partition scheme [TransactionDatePS] AS partition [QuarterlyDate] TO ( [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]); GO CREATE TABLE Production.TransactionHistoryArchive_IncrementalDemo( TransactionID int NOT NULL, ProductID int NOT NULL, ReferenceOrderID int NOT NULL, ReferenceOrderLineID int NOT NULL, TransactionDate datetime NOT NULL, TransactionType nchar(1) NOT NULL, Quantity int NOT NULL, ActualCost money NOT NULL, ModifiedDate datetime NOT NULL) ON TransactionDatePS(TransactionDate); GO
Now, I am going to populate partitioned table. To show before and after effect, we would populate only one partition at this point. Since our PF is defined as “LFET” which mean boundary value would go to left side. If I have to populate 8th partition, the range would be TransactionDate > N’2006-11-01T00:00:00′ and TransactionDate <= N’2007-02-01T00:00:00′. Here is the query to dump data into newly created table.
INSERT INTO Production.TransactionHistoryArchive_IncrementalDemo SELECT * FROM Production.TransactionHistoryArchive WHERE TransactionDate > N'2006-11-01T00:00:00' and TransactionDate <= N'2007-02-01T00:00:00'
(10324 row(s) affected). Now, let’s look at partitions
SELECT * FROM sys.partitions WHERE object_id = OBJECT_ID('Production.TransactionHistoryArchive_IncrementalDemo');
As expected, we have 10324 rows in 8th partition. Use below to create “incremental” statistics. Notice that I have added new clause which is available in SQL Server 2014.
CREATE STATISTICS IncrementalStatsDemo ON Production.TransactionHistoryArchive_IncrementalDemo (TransactionDate) WITH FULLSCAN, INCREMENTAL = ON;
Execute DBCC SHOW_STATISTICS command to look at histogram
DBCC SHOW_STATISTICS('Production.TransactionHistoryArchive_IncrementalDemo', IncrementalStatsDemo) with histogram
Added more rows to different partition
INSERT INTO Production.TransactionHistoryArchive_IncrementalDemo SELECT * FROM Production.TransactionHistoryArchive WHERE TransactionDate > N'2007-02-01T00:00:00' and TransactionDate <= N'2007-08-01T00:00:00'
(27318 row(s) affected)
UPDATE STATISTICS Production.TransactionHistoryArchive_IncrementalDemo (IncrementalStatsDemo) with resample ON PARTITIONS(9, 10)
Now, lets look at stats again.
If we compare with earlier histogram image, the highlighted range was at step 81 earlier which has moved to step 60 now and Total number of steps have increased to 192 (as compared to 81 earlier). This means that update statistics command has read those partitions which were specified (9 and 10) and merged with earlier created statistics. As per books online “ON PARTITIONS – Forces the leaf-level statistics covering the partitions specified in the ON PARTITIONS clause to be recomputed, and then merged to build the global statistics. WITH RESAMPLE is required because partition statistics built with different sample rates cannot be merged together.”
There is more to cover because there are various syntax added on different places. In next blog, I would cover those.
Stay tuned.