Help: SQL Server

Sharing my knowlege about SQL Server Troubleshooting Skills

  • Blog Stats

    • 1,882,453 hits
  • Select GETDATE()

    May 2021
    M T W T F S S
     12
    3456789
    10111213141516
    17181920212223
    24252627282930
    31  

Archive for the ‘SQL Server 2014’ Category

Solution : Suspect Database – SQL Server detected a DTC/KTM in-doubt transaction with UOW .Please resolve it following the guideline for Troubleshooting DTC Transactions.

Posted by blakhani on January 27, 2015


While testing application which uses distributed transaction, I caused a failure of SQL Server Database. When I performed failover I found that database didn’t recover and failed to come online. It was shown as “Suspect” in SQL Server Management Studio. I checked ERRORLOG and found below messages.

2015-01-22 22:50:33.570    spid15s    Recovery of database ‘MyBadDatabase’ (5) is 0% complete (approximately 24 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-01-22 22:50:33.990    spid15s    Recovery of database ‘MyBadDatabase’ (5) is 2% complete (approximately 24 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-01-22 22:50:34.080    spid15s    Recovery of database ‘MyBadDatabase’ (5) is 0% complete (approximately 130 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-01-22 22:50:34.180    spid15s    Recovery of database ‘MyBadDatabase’ (5) is 11% complete (approximately 5 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-01-22 22:50:35.480    spid15s    Recovery of database ‘MyBadDatabase’ (5) is 44% complete (approximately 2 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-01-22 22:50:37.720    spid15s    10704 transactions rolled forward in database ‘MyBadDatabase’ (5). This is an informational message only. No user action is required.
2015-01-22 22:50:37.820    spid7s    Attempting to initialize Microsoft Distributed Transaction Coordinator (MS DTC). This is an informational message only. No user action is required.
2015-01-22 22:50:37.850    spid15s    CHECKDB for database ‘MyBadDatabase’ finished without errors on 2012-01-08 23:41:18.113 (local time). This is an informational message only; no user action is required.
2015-01-22 22:50:38.870    spid7s    Attempting to recover in-doubt distributed transactions involving Microsoft Distributed Transaction Coordinator (MS DTC). This is an informational message only. No user action is required.
2015-01-22 22:52:22.990    spid7s    SQL Server detected a DTC/KTM in-doubt transaction with UOW  {6BAC37B8-6515-4FC1-972B-C059D1D5133E}.Please resolve it following the guideline for Troubleshooting DTC Transactions.
2015-01-22 22:52:22.990    spid7s    Error: 3437, Severity: 21, State: 3.
2015-01-22 22:52:22.990    spid7s    An error occurred while recovering database ‘MyBadDatabase’. Unable to connect to Microsoft Distributed Transaction Coordinator (MS DTC) to check the completion status of transaction (0:743395420). Fix MS DTC, and run recovery again.
2015-01-22 22:52:22.990    spid7s    Error: 3414, Severity: 21, State: 2.
2015-01-22 22:52:22.990    spid7s    An error occurred during recovery, preventing the database ‘MyBadDatabase’ (database ID 5) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

 

As we can see that database was trying to recover. Phase 2 of roll forward is complete and Phase 3 (Rollback phase) encountered an error. The error is cause due to the fact that my bad application uses MSDTC and I caused things to break in MSDTC itself. Ideal way to fix the problem would be to follow KB http://support.microsoft.com/default.aspx?scid=kb;EN-US;306366 (PRB: Errors 3437, 3414, and "Could Not Connect to DTC" Occur When You Start SQL Server)

In my case, I have restarted DTC and I was not able to find UOW anywhere which I need to kill. When I checked sys.dm_tran_locks for column request_owner_guid for any GUID that is non ZERO. But I didn’t find any non Zero (6BAC37B8-6515-4FC1-972B-C059D1D5133E). Here are the steps I did to fix my database.

sp_configure 'show advanced options', 1 
GO 
RECONFIGURE 
GO 
sp_configure 'in-doubt xact resolution', 2 --- presume abort 
GO 
RECONFIGURE 
GO 

Error log showed the change.

2015-01-23 00:45:53.850 spid86       Configuration option ‘in-doubt xact resolution’ changed from 0 to 2. Run the RECONFIGURE statement to install.

In your case, you need to check with application vendor/team to find what needs to be done with In-doubt transactions. Please use appropriate value (I have used 2) and then perform recovery of database.

dbcc dbrecover(MyBadDatabase)

 

Once above steps are performed, I was able to get database recovered. Here is the snippet from errorlog.

2015-01-23 00:46:10.090 spid57       Bypassing recovery for database ID 5. This is an informational message only. No user action is required.

2015-01-23 00:46:10.100 spid57       Starting up database ‘MyBadDatabase’.

2015-01-23 00:47:28.570 spid57       Recovery of database ‘MyBadDatabase’ (5) is 0% complete (approximately 130 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

2015-01-23 00:47:28.670 spid57       Recovery of database ‘MyBadDatabase’ (5) is 1% complete (approximately 7 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

2015-01-23 00:47:29.630 spid57       Recovery of database ‘MyBadDatabase’ (5) is 29% complete (approximately 2 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

2015-01-23 00:47:34.800 spid57       1 transactions rolled back in database ‘MyBadDatabase’ (5). This is an informational message only. No user action is required. 2015-01-23 00:47:34.800 spid57       Recovery is writing a checkpoint in database ‘MyBadDatabase’ (5). This is an informational message only. No user action is required. 2015-01-23 00:47:34.820 spid57       Recovery completed for database MyBadDatabase (database ID 5) in 6 second(s) (analysis 20 ms, redo 3142 ms, undo 3056 ms.) This is an informational message only. No user action is required.

2015-01-23 00:47:34.860 spid57       CHECKDB for database ‘MyBadDatabase’ finished without errors on 2012-01-08 23:41:18.113 (local time). This is an informational message only; no user action is required.

Hopefully this would bring database online if it was caused due to DTC problem. There might be various causes for suspect database and this blog just covers one of them.

  • Cheers,
  • Balmukund Lakhani
  • Twitter @blakhani
  • Author: SQL Server 2012 AlwaysOnPaperback, Kindle
  • Posted in SQL Server, SQL Server 2012, SQL Server 2014 | Leave a Comment »

    SQL 2014 Learning Series # 16 – New Feature – Single Partition Online Index Rebuild (SPOIR)

    Posted by blakhani on June 26, 2014


    Till SQL Server 2012, if we want to rebuild a partition table online, the only option available was to rebuild all the partitions of the table. If we want to rebuild only one partition, the option was to do it offline. Not being able to perform online partition rebuilds means application teams have to ultimately decide between index fragmentation or data being un-available while indexes are rebuilt, neither are consistent with an expectation of 24/7 enterprise data platform. Offline rebuild could result in much longer maintenance operations as partition grows. If we attempt to rebuild one partition online, we would be welcomed with below message

    Msg 155, Level 15, State 1, Line 4
    ‘ONLINE’ is not a recognized ALTER INDEX REBUILD PARTITION option.

    In SQL Server 2014, online single partition index operations are supported. The name of the feature “Single Partition Online Index Rebuild “ is mouthful, so some of you may prefer to use SPOIR in your day to day talk. The name of the feature itself is an explanation and you would have understood what it does. Let’s look at few highlights of this feature.

    • Better control on online rebuild by choosing one or more partitions.
    • This would keep the table accessible when rebuild is going on. As explained in earlier blog, short terms locks are taken at beginning and end of index rebuild.
    • We can combine SPOIR with Managed Lock Priority (refer Blog # 13, 14, 15 of this series)
    • Due to all of the above, the availability of the table would increase.
    • CPU, Memory consumption would also reduce due to single partition rebuild.

    Here is the syntax as per Books online: ALTER TABLE and ALTER INDEX

    image

    Let’s see it in action now.

    To have some sample data, we can have download AdventureWorks sample database from: http://msftdbprodsamples.codeplex.com/downloads/get/417885

    I have restored it in SQL Server 2014 and named as AdventureWorks2014. I would be playing around with [AdventureWorks2014].[Production].[TransactionHistoryArchive] table which has transactions from 2005-05-17 to 2007-08-31. I have created yearly partition.

    Use AdventureWorks2014;
    GO
    
    CREATE partition FUNCTION [YearlyDate](datetime) AS range RIGHT FOR VALUES ( 
    N'2005-05-01T00:00:00', N'2006-05-01T00:00:00',  N'2007-05-01T00:00:00'); 
    GO
    
    CREATE partition scheme [TransactionDatePS] AS partition [YearlyDate] TO ( 
    [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]);
    GO
    
    
    CREATE TABLE Production.TransactionHistoryArchive_SPOIR(
        TransactionID int NOT NULL,
        ProductID int NOT NULL,
        ReferenceOrderID int NOT NULL,
        ReferenceOrderLineID int NOT NULL, 
        TransactionDate datetime NOT NULL,
        TransactionType nchar(1) NOT NULL,
        Quantity int NOT NULL,
        ActualCost money NOT NULL,
        ModifiedDate datetime NOT NULL)
    ON TransactionDatePS(TransactionDate);
    GO
    
    Create Clustered Index idx_TransactionID on Production.TransactionHistoryArchive_SPOIR (TransactionID)
    go
    

    Let’s populate partitioned table TransactionHistoryArchive_SPOIR from TransactionHistoryArchive

    INSERT INTO Production.TransactionHistoryArchive_SPOIR
    SELECT * FROM Production.TransactionHistoryArchive
    -- (89253 row(s) affected)
    
    

    Let’s check how much space is used per page. Since we just created clustered index, all pages would be almost full.

    SELECT partition_number, 
           index_id, 
           avg_page_space_used_in_percent, 
           page_count 
    FROM   sys.Dm_db_index_physical_stats(Db_id(), 
    Object_id('AdventureWorks2014.Production.TransactionHistoryArchive_SPOIR' ), NULL, NULL, 'DETAILED') 
    
    

    image

    Let’s delete alternate rows. in 2nd partition.

    Delete from Production.TransactionHistoryArchive_SPOIR
    where TransactionDate >= '2005-05-01T00:00:00' and TransactionDate < '2006-05-01T00:00:00'
    and TransactionID % 2 =  1
    go
    -- (10755 row(s) affected)
    
    

    Once we delete, we should see TransactionID as 2, 4, 6.. (even numbers, due to delete)

    image

    Let’s rebuild only Partition # 2 online.

    ALTER INDEX idx_TransactionID on Production.TransactionHistoryArchive_SPOIR
    REBUILD PARTITION = 2 WITH (ONLINE = ON)

    and let’s check the output again.

    image

    This means online index rebuild of single partition is working as expected. In my example I don’t have fragmentation because all pages are next to each other. That’s why I have not shown avg_fragmentation_in_percent column.

    Go Do: If you want to learn, download EVAL version and read other new feature of SQL Server 2014 here.

  • Cheers,
  • Balmukund Lakhani
  • Twitter @blakhani
  • Author: SQL Server 2012 AlwaysOnPaperback, Kindle
  • Posted in SQL 2014 Learning Series, SQL Server 2014 | Tagged: , , , , , , , | 4 Comments »

    SQL 2014 Learning Series # 15 – New Feature – Managed Lock Priority (Part 3)

    Posted by blakhani on June 24, 2014


    In earlier two blogs (Part 1 and Part 2) we have seen various new options added to support MLP in online Index rebuild using ALTER INDEX. Same holds true for ALTER TABLE … SWITCH PARTITON as well. In this blog, we would see new enhancements done in DMVs and Extended Events support these options.

    Enhancement in DMVs. As per books online, request_status column in sys.dm_tran_locks has three new values added to support this feature. LOW_PRIORITY_CONVERT, LOW_PRIORITY_WAIT and ABORT_BLOCKERS. In the similar way, the wait_type column in sys.dm_os_wait_stats has new values, LOW_PRIORITY and ABORT_BLOCKERS.

    In previous blog, we have seen enhancements done in ERRORLOG logging. If you are familiar will the “KILL” command in SQL Server, you must be aware that this is logged into ERRORLOG as well. In the same fashion, if any session is aborted due to ALTER DDL command, it would also be logged in ERRORLOG. We would get information about hostname, ObjectName and session ID which I have put below for reference.

    2014-06-19 09:00:20.850 spid55       An ‘ALTER INDEX REBUILD’ statement was executed on object ‘OnlineIndexRebuild’ by hostname ‘SQLServerHelp’, host process ID 13560 using the WAIT_AT_LOW_PRIORITY options with MAX_DURATION = 1 and ABORT_AFTER_WAIT = BLOCKERS. Blocking user sessions will be killed
    2014-06-19 09:01:21.780 spid55       An ABORT_AFTER_WAIT = BLOCKERS lock request was issued on database_id = 7, object_id = 245575913. All blocking user sessions will be killed.
    2014-06-19 09:01:21.780 spid55       Process ID 53 was killed by an ABORT_AFTER_WAIT = BLOCKERS DDL statement on database_id = 7, object_id = 245575913.

    Enhancement in xEvents Let’s look at same example of online index rebuild which we have used, in earlier blog, to learn more about extended events added to support monitoring of managed lock priority feature. There are three new events which have been introduced which will help us in diagnostics.

    1. ddl_with_wait_at_low_priority – this event is fired when a ALTER INDEX / TABLE DDL statement is executed using the WAIT_AT_LOW_PRIORITY option.
    2. lock_request_priority_state – The priority state of a lock request.
    3. process_killed_by_abort_blockers – this event occurs if we chose ABORT = BLOCKERS and a process is killed due to that. This event would have information about who was killed by whom, what the was DDL fired etc.

    Let’s create an extended event session and capture some data by reproducing the scenario. If you are comfortable with T-SQL then use below.

    CREATE EVENT SESSION [MLP] ON SERVER 
    ADD EVENT sqlserver.ddl_with_wait_at_low_priority(
        ACTION(sqlserver.client_app_name,sqlserver.database_id,sqlserver.session_id,sqlserver.sql_text)),
    ADD EVENT sqlserver.lock_request_priority_state(
        ACTION(sqlserver.client_app_name,sqlserver.database_id,sqlserver.session_id,sqlserver.sql_text)),
    ADD EVENT sqlserver.process_killed_by_abort_blockers(
        ACTION(sqlserver.client_app_name,sqlserver.database_id,sqlserver.session_id,sqlserver.sql_text)) 
    ADD TARGET package0.event_file(SET filename=N'MLP')
    WITH (STARTUP_STATE=OFF)
    GO
    

    You can also use “New Session Wizard” as shown below and make sure that you choose new events which are listed above.

    image

    Let’s start the session first.

    ALTER EVENT SESSION [MLP] ON SERVER STATE=START

    Let’s create database (MLPDemo), table (OnlineIndexRebuild) and populate it. Refer earlier blog for script. I have executed below T-SQL and as per my server, SPID 60 is running select statement which is going to block ALTER INDEX DDL command.

    BEGIN TRAN 
    SELECT count(*) 'Count'
    FROM   OnlineIndexRebuild (HOLDLOCK) 
    WHERE  iID <= 1000  
    
    Select @@SPID '@@SPID'
    -- ROLLBACK TRAN
    

    image

    In another session, we would now execute ALTER INDEX DDL.

    Select @@SPID 'SPID'
    go
    
    ALTER INDEX PK_OnlineIndexRebuild ON OnlineIndexRebuild
    REBUILD WITH 
    ( 
    ONLINE = ON ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = BLOCKERS ) )
    )
    
    

    image

    and as expected it went to executing state and got blocked by 60. After 1 minute, SPID 60 was killed to complete rebuild process which was started by SPID 64. Here is what is logged in MLP session which we created using T-SQL.

    SNAGHTML337c4ac9

    At row # 1, we started DDL command and “ddl_with_wait_at_low_priority” event is fired. This event captured abort_after_wait column as “BLOCKERS” and also shows object_name. Then we can see next event lock_request_priority_state where it shows the lock mode. The process was able to acquire schema shared (SCH_S) lock, intent shared lock (IS) and shared (S) lock. But when it came for schema-modification (SCM_M) lock (which is after build phase is complete) it waited (row #6). Exactly after MAX_DURATION (notice time gap between row # 6 and 7) the time got expired and “abort blockers” action was taken (row 8). Due to that we are seeing “process_killed_by_abort_blockers” and in “killed process id” column shows as 60 as it’s the session which was doing Select (refer image # 2)

    Below is the same profiler with ABORT_AFTER_WAIT = NONE (see row # 1 below). Notice below that after expiration on the time (row # 7 below) the state was “Normal Priority” as oppose to Abort Blockers. Nothing was killed here.

    SNAGHTML337bbd2f

    You can check yourself and find what would happen if we give ABORT_AFTER_WAIT = SELF.

    At this point, we have covered various aspects of MLP. It’s time to cover the limitations and good practices about this feature. Below are some limitations.

    • One of the limitation which we should keep in mind (and it’s obvious as well) that even if we specify kill blockers, SQL Server would not kill any system transactions to get locks. SQL Server would kill only user blocking sessions. There could be situations where system transactions (ghost cleanup, shrink and others) is causing blocking as they also acquire some locks and those would not be killed by SQL Server.
    • Another limitation is with AlwaysOn and Replication. The low priority queue are only available on primary replica (in availability group) or publisher (in replication) . If DDL has to move and execute (via data movement) on secondary replica in AlwaysOn Availability Group or Subscriber in replication – it would always run in normal queue.
    • SWITCH PARTITION can only be used for between two tables or indexed views.
    • SSMS doesn’t show the options for MLP. SMO and PowerShell can be used though.

    We also need to keep in mind that MAX_DURATION setting can affect the transaction log flush. If you decided to choose a big number, make sure that you have enough disk space to hold that much of transaction log because it can’t be flushed/truncated. The size of transaction log can effect database recovery, replication performance, availability group performance etc. So it’s always good if you can predict how much time OIR would take and manage the space accordingly.

    I truly hope that this blog has covered the pending things about this new feature, managed lock priority. We would continue the series and cover other features as well.

  • Cheers,
  • Balmukund Lakhani
  • Twitter @blakhani
  • Author: SQL Server 2012 AlwaysOnPaperback, Kindle
  • Posted in SQL 2014 Learning Series, SQL Server 2014 | Tagged: , , , , , | Leave a Comment »