Posted by blakhani on April 17, 2015
If you have searched for checkpoint in SQL Server and landed on this page, then you might get disappointed. This blog is NOT about the checkpoint background process which executes in SQL Server. This blog is about the checkpoint which is windows cluster piece.
Checkpoint Manager is the component of Cluster Service which monitors the changes in the Resources on a Cluster. In simple words, checkpoint is a process in cluster which synchronize the registry key values between the nodes. SQL Server startup depends on the registry key values. By default there are three registry keys: –d for path of master.mdf, –l is for the location of mastlog.ldf fie and –e is for the location of SQL Server ERRORLOG file. More startup parameters can be added based on requirement. For example if you are troubleshooting deadlock then you may add –T1222 as startup parameter. In case of cluster, you make change to registry key by using SQL Server Configuration Manager (a.k.a. SSCM). Once value is saved in SSCM, it would be saved in registry. Once failover is performed, the same change would be applied to other node. This is done by cluster checkpoint.
What would happen if checkpoints are not defined? The changes made on active node would not propagate to other nodes during failover. Another interesting point which you must note is that if any change is made to registry key when resource is offline, the changed would be reverted back. When the resource goes offline, the values are saved into registry.
Since this is windows piece, there are windows commands and PowerShell cmdlets available to view and edit them. In earlier version of windows, cluster.exe was used to view and edit the checkpoints and in newer version of operating systems where admins loves PowerShell, there are cmdlets exposed. Here is the list of commands to add checkpoint
Old Version (if you have cluster.exe)
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\Cluster"
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\MSSQLServer"
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\Replication"
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\Providers"
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\SQLServerSCP"
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\CPE"
cluster res "SQL Network Name (VSName)" /addcheck: "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\SQLServerAgent"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\Cluster"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\MSSQLServer"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\Replication"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\Providers"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\SQLServerSCP"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\CPE"
Add-ClusterCheckpoint -ResourceName "SQL Network Name (VSName)" -RegistryCheckpoint "SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.VSName\SQLServerAgent"
Here are the command to get list of checkpoint (my Virtual Server Name is RajaSQLServer).
cluster res "SQL Network Name (RajaSQLServer)" /checkpoints
Here is the PowerShell command
You would notice the difference in registry key values as compared to the command. This is because of the fact that the command is for SQL 2014 that’s why we are seeing “MSSQL12”. The exact value can be fetched from registry.
Author: SQL Server 2012 AlwaysOn – Paperback, Kindle
Posted in Checkpoint, Cluster, SQL Server | Tagged: Add-ClusterCheckpoint, checkpoints, cluster res, Get-ClusterCheckpoint | 1 Comment »
Posted by blakhani on April 14, 2015
This is one of the common question asked. This blog has list of resources which can be useful in getting answer. First lets look at books online.
Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups (SQL Server)
Maximum number of availability groups and availability databases per computer: The actual number of databases and availability groups you can put on a computer (VM or physical) depends on the hardware and workload, but there is no enforced limit. Microsoft has extensively tested with 10 AGs and 100 DBs per physical machine. Signs of overloaded systems can include, but are not limited to, worker thread exhaustion, slow response times for AlwaysOn system views and DMVs, and/or stalled dispatcher system dumps. Please make sure to thoroughly test your environment with a production-like workload to ensure it can handle peak workload capacity within your application SLAs. When considering SLAs be sure to consider load under failure conditions as well as expected response times.
In general, the more databases that are replicated and the more secondary replicas that exist – the more worker threads and more memory that will be consumed just to have the AlwaysOn infrastructure. As the text above indicates, there is no enforced limit, but the more you have the more worker threads and memory will be needed. If there are insufficient worker threads you will probably see error messages in the SQL Error log similar to:
“The thread pool for AlwaysOn Availability Groups was unable to start a new worker thread because there are not enough available worker threads. This may degrade AlwaysOn Availability Groups performance. Use the "max worker threads" configuration option to increase number of allowable threads.”
If starved for memory, you could see many different error messages – that may or may not look like they relate to AlwaysOn. One possible message could be:
“Could not start the AlwaysOn Availability Groups transport manager. This failure probably occurred because a low memory condition existed when the message dispatcher started up. If so, other internal tasks might also have experienced errors. Check the SQL Server error log and the Windows error log for additional error messages. If a low memory condition exists, investigate and correct its cause.”
Here are other blogs which explain the number of threads in worker pool to support availability group.
AlwaysOn – HADRON Learning Series: Worker Pool Usage for HADRON enabled databases
Monitoring SQL Server 2012 AlwaysOn Availability Groups Worker Thread Consumption
Hope this helps.
Posted in AlwaysOn, SQL Server 2012, SQL Server 2014 | Tagged: AG, AlwaysOn, Availability Group, sql server | Leave a Comment »
Posted by blakhani on April 7, 2015
Recently I have had a friend who reported below issues.
- When I expand database, it takes a lot of time.
- When I expand Jobs under SQL Server Agent node in SQL Server Management Studio, it freezes and finally it fails with error
“An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding”
The symptom on the server that we were working on was that from Management Studio, when trying to browse SQL agent job, the interface would hang. I have asked to capture Profiler trace to find out which query is taking time and what is the waits for those query which are stuck.
In profiler trace, and the query to capture currently running queries (refer this blog) I found that it was running sp_help_job stored procedure from MSDB database. This is the procedure to get high level details about all jobs in MSDB database using msdb.dbo.sysjobs_view. When we looked further, we found that SQL Server is running function dbo.SQLAGENT_SUSER_SNAME and got stuck at SELECT @ret = SUSER_SNAME(@user_sid) statement. The wait for the session is was PREEMPTIVE_OS_LOOKUPACCOUNTSID and wait time was increasing. This wait is related to the communication/validation from Active Directory.
When we debugged further, here is the chain of reaching to function.
>> Query having – owner = dbo.SQLAGENT_SUSER_SNAME(sjv.owner_sid)
>> SELECT @ret = SUSER_SNAME(@user_sid)
This is getting stuck at PREEMPTIVE_OS_LOOKUPACCOUNTSID. We was identified that the function is used to convert SIDs stored in SQL Server table to the name by making call to Active Directory. Now the challenge was to find why and also was to identify if its happening with particular logins or all login. The complexity here was that SQL Server stores SID in varbinary format not in the format which OS would understand.
Luckily, I have had a blog post having script to convert the varbinary to well known format. So I have used that to convert SIDs obtained from below query
where owner_sid <> 0x01
Once we have SID value in OS understandable format, I used PsGetSID tool from sysinternals to get Windows account name. While running that it was taking a long time and finally it failed with below error.
Error querying SID:
The trust relationship between the primary domain and the trusted domain failed
So, it was something to do with two domains trust which seems to be broken. I asked him to work with this Windows Domain Admin team and networking team to get the issue resolved.
Same issue might happen during database expand also as, database owner is a SID stored in sys.databases and that has to be converted to name.
Hope this helps.
Posted in SQL Server Management Studio, SSMS, Troubleshooting | Tagged: PREEMPTIVE_OS_LOOKUPACCOUNTSID, Waittype | 4 Comments »