Help: SQL Server

Sharing my knowlege about SQL Server Troubleshooting Skills

Archive for the ‘Cluster’ Category

Tips and Tricks: Useful Parameters of Get-ClusterLog

Posted by blakhani on May 20, 2016


Working with Root Cause Analysis (RCA) is also part of my work at Microsoft. In case of cluster failover RCA, it is very important to get cluster log. Sometimes there are situation where we want to generate cluster log for last few minutes for quicker analysis of live issues. This blog explains some common parameters which I used in my day-to-day troubleshooting.

I have 4 nodes cluster in my lab named SRV1, SRV2, SRV3, SRV4.

  • Default command – generates Cluster.log file on ALL nodes in C:\Windows\Cluster\Reports folder. File name would be Cluster.log

Get-ClusterLog 

  • if we want the cluster log to be generated for specific node(s) then we can use –Node parameter. We can put comma separated node names as shown below.

Get-ClusterLog -Node SRV1, SRV3

  • You might know that the time shown in cluster log is UTC be default. Sometimes its difficult to translate UTC time to local time, specially for time-zones which has daylight saving. Luckily, cluster log can be generated in local time using parameter UseLocalTime . Here is the sample code.

Get-ClusterLog –UseLocalTime

  • Another useful parameter is to copy the files to specific location. This command would generate logs and also dump on specified location. in below example, I am dumping logs from all nodes to C:\Temp folder.

Get-ClusterLog –Destination “C:\Temp”

  • TimeSpan is another parameter which can generate cluster log for last number of minutes specified. By default it would generate Cluster.log for complete time. I find it useful when I repro’ed a problem and I want to look at cluster log for last 2 to 3 minutes. Here is the command to generate log for last 3 minutes.

Get-ClusterLog –TimeSpan 3

So, this is my favorite command after reproducing cluster issue on local node.

Get-ClusterLog -Node SRV1 -TimeSpan 2 -UseLocalTime -Destination C:\

Hopefully it would be useful.

Cheers,
Balmukund

Posted in Cluster, Tips and Tricks, Troubleshooting | Tagged: , , | 2 Comments »

SQL Server Cluster – Could not register Service Control Handler. Operating system error = 2310(This shared resource does not exist.).

Posted by blakhani on February 16, 2016


This is one of an interesting issue I faced with one of my customer. Its worth to share as this is NOT a very common issue and I was not able to find the solution which we discovered. We were having 2 nodes windows cluster running single clustered instance of SQL Server. Node names were SRV1 and SRV2.

Log Name:      Application
Source:        MSSQLSERVER
Date:          1/22/2016 6:10:33 AM
Event ID:      17141
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      SRV2.myDomain.com
Description:
Could not register Service Control Handler. Operating system error = 2310(This shared resource does not exist.).

  • Checked the log folder to see SQL server error log but there was no file generated there.
  • We tried starting the SQL server service via services.msc applet but that also did not work
  • net start mssqlserver also gives same
  • Interestingly, when we start sqlservr.exe from the command prompt it works fine.

Cluster Log shows below:

00000f44.00000ad8::2016/01/22-11:06:32.221 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] StartResourceService: Failed to start MSSQLSERVER service.  CurrentState: 1
00000f44.00000ad8::2016/01/22-11:06:32.221 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] OnlineThread: ResUtilsStartResourceService failed (status 435)
00000f44.00000ad8::2016/01/22-11:06:32.221 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] OnlineThread: Error 435 bringing resource online.
00000f44.00000ad8::2016/01/22-11:06:32.221 ERR   [RHS] Online for resource SQL Server failed.
0000091c.00001230::2016/01/22-11:06:32.221 WARN  [RCM] HandleMonitorReply: ONLINERESOURCE for ‘SQL Server’, gen(1) result 5018.
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] TransitionToState(SQL Server) OnlinePending–>ProcessingFailure.
0000091c.00001230::2016/01/22-11:06:32.221 ERR   [RCM] rcm::RcmResource::HandleFailure: (SQL Server)
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] resource SQL Server: failure count: 2, restartAction: 0.
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] resource SQL Server will not be restarting; isLowPriority: false; numDependents: 1, failureCount: 2, restartAction: 0
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] TransitionToState(SQL Server) ProcessingFailure–>[WaitingToTerminate to Failed].
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] TransitionToState(SQL Server) [WaitingToTerminate to Failed]–>[Terminating to Failed].
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] Will retry online of SQL Server in 3600000 milliseconds.
0000091c.00001230::2016/01/22-11:06:32.221 INFO  [RCM] TransitionToState(SQL Server Agent) WaitingToComeOnline–>OfflineDueToProvider.
0000091c.00001450::2016/01/22-11:06:32.221 INFO  [RCM] HandleMonitorReply: TERMINATERESOURCE for ‘SQL Server’, gen(2) result 0.
0000091c.00001450::2016/01/22-11:06:32.221 INFO  [RCM] TransitionToState(SQL Server) [Terminating to Failed]–>Failed.

 

We also had a file share resource in cluster. Interestingly that failed with error:

File system check failed because scoped network name appears not to be registered with Server service, number of shares verified: 3.

Solution: Restart the “Server” Service. We tried keeping this server is delayed start mode but that didn’t help. We found that every time we restarted the node, we had same problem. So solution was to restart the “Server” service after reboot and then failover works without problem.

Posted in Cluster, SQL Server | Tagged: , , | 10 Comments »

SQL Cluster Setup Error – System.Runtime.InteropServices.COMException (0x80070490): Element not found

Posted by blakhani on May 19, 2015


My job revolves around troubleshooting and fixing the broken thing. Here is one of the situation which I ran into recently and was unable to find solution on internet. It my responsibility to provide self-assist option to the SQL community so that they can find the problem and fix by themselves.

I was trying to install SQL Server 2012 on a 2 nodes Windows cluster. When I tried installing it, it failed with error in subject line. At this first look it sounds like some COM+ error but as always, setup logs are my first place to find the errors. Here is the MSDN link which explains the various files created by setup

https://msdn.microsoft.com/en-us/library/ms143702(v=sql.110).aspx (View and Read SQL Server Setup Log Files)

The information which I saw in setup logs was pretty interesting. In particular, I looked into Detail.txt file which is the parent file of all MSI logs. (I have removed date and time for better reading)

Error: Action "Microsoft.SqlServer.Configuration.SetupExtension.ValidateFeatureSettingsAction" threw an exception during execution.
Microsoft.SqlServer.Setup.Chainer.Workflow.ActionExecutionException: Element not found. (Exception from HRESULT: 0x80070490) —> System.Runtime.InteropServices.COMException (0x80070490): Element not found. (Exception from HRESULT: 0x80070490)
   at Microsoft.SqlServer.Interop.MSClusterLib.ISClusResource.get_Disk()
   at Microsoft.SqlServer.Configuration.Cluster.ClusterPhysicalDisk.get_Partitions()
   at Microsoft.SqlServer.Configuration.ClusterConfiguration.ClusterDiskPublicConfigObject.IsPathOnSharedDisk(String path)
   at Microsoft.SqlServer.Configuration.SetupExtension.SlpInputSettings.ValidateNotOnSharedDisk(ValidationState vs, String directoryName, String bindingKey, String errorMessage)
   at Microsoft.SqlServer.Configuration.SetupExtension.SlpInputSettings.Validate_InstallSharedDir(ValidationState vs)
   at Microsoft.SqlServer.Configuration.SetupExtension.SlpInputSettings.ValidateSettings()
   at Microsoft.SqlServer.Configuration.SetupExtension.ValidateFeatureSettingsAction.ExecuteAction(String actionId)
   at Microsoft.SqlServer.Chainer.Infrastructure.Action.Execute(String actionId, TextWriter errorStream)
   at Microsoft.SqlServer.Setup.Chainer.Workflow.ActionInvocation.ExecuteActionHelper(TextWriter statusStream, ISequencedAction actionToRun)

 

The stack goes from bottom to Top. If we read the function calls made, anyone can conclude that there is something happening with cluster disks. That’s a good hint. So, I went back to failover cluster manager and looked into Disks under “Storage”. There was a disk which was in failed state. I was not able to bring this online and that’s THE problem! SQL Setup would enumerate the disks to find eligible disks which can be used and it’s not able to find details about that disk. Here was the error when I attempted to bring it online.

The resource ‘Cluster Disk 1’ did not come online.
The desired state change for ‘Cluster Disk 1’ did not occur before the timeout expired.

I realized that I have played with iSCSI and messed up the disk which was presented.

Solution: Delete the disks which are not able to come online under “Storage > Disks” or “Available Storage” in failover cluster manager interface.

Hope this helps.

  • Cheers,
  • Balmukund Lakhani
  • Twitter @blakhani
  • Author: SQL Server 2012 AlwaysOnPaperback, Kindle
  • Posted in Cluster, Installation, Setup, SQL Server 2012, SQL Server 2014 | Tagged: , , , , | 3 Comments »