Hekaton Part 6: MVCC Assumptions and Validation

One in a series of posts about Microsoft’s In-Memory OLTP Functionality

I originally intended to make part 6 in this series about error handling, but quickly realized that I needed to lay a bit of a foundation on multi-version concurrency control (MVCC), and particularly some of the assumptions made by MVCC and how those assumptions are subsequently confirmed.

Historically, SQL Server has used a pessimistic form of concurrency. My session sees every other connection as the enemy. Every other connection on the system is presumed to want access to my resources, and so SQL employs a locking mechanism to prevent anyone else from stomping over my work.

Now, locking and isolation levels is a complex topic in itself, but a simple example will suffice for the the purposes of this discussion. Suppose I update the Product table for ID 23, and before I commit my transaction, you come along and try to update the same row. Your connection will be blocked from proceeding until I commit or rollback, at which point you will be allowed to make the update. Under most isolation levels you won’t even be allowed to read the row I’ve updated so long as it is uncommitted.

Blocking is a good thing in that it preserved isolation, but a bad thing when everybody is always waiting on someone else.

In SQL Server 2005, Microsoft introduced a quasi-optimistic concurrency model called snapshot that addressed some of these problems. With snapshot enabled, you would be able to read the row that I’ve updated (you will see the values in the row prior to any of my changes). However, you still won’t be able to update it. You will be blocked just as always.

Snapshot isolation is frequently explained as reader don’t block writers, and writers don’t block readers.

Hekaton takes us a huge step forward toward optimistic concurrency. It makes a couple of assumptions that remove that “every other session is the enemy” paradigm. One assumption is that write conflicts are rare, so SQL always proceeds with data modifications under the assumption that no one else is going to change the same row.

A second assumption is that competing transactions are going be committed and not rolled back.

Now, sometimes these assumptions simply aren’t valid, so SQL Server will includes validation checks to make sure that things don’t go awry.

We’ll get into internals in a later part of this series to better explain how SQL does all this magic, but for now suffice it to say that SQL maintains multiple versions of the same row along with a pair of time stamps indicating when the row is valid. So when we update ProductId 23, SQL will actually create a new row in the table, and mark the original row as valid through, say, 1:17 pm, and the new row only becomes valid at 1:17 pm. When a user starts a new transaction, SQL notes the time that the transaction starts and during the course the transaction will always return the row that was valid and committed at that time.

Let’s demonstrate. We’ll open two tabs in Management Studio, each connected to our Hekaton database. In the first tab, we’ll make sure that the price on product 23 is a known value.

update dbo.Product
set UnitPrice = 23.00
where ProductID = 23;

Now in tab 1, we’ll start a transaction and modify the price.

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 30.00
where ProductID = 23;

In tab 2, let’s examine the value.

select ProductID, UnitPrice
from dbo.Product
where ProductID = 23;

This confirms that connection 2 sees a UnitPrice of 23.00, which is the committed value of the row at the time the transaction started. Continuing in tab 2, start a new transaction:

begin transaction;

Switch back to tab 1 and commit.

commit;

Re-execute the select in connection 2, and you will still get 23.00. Again, this was the committed value of the row at the time the transaction began. But if we close the transaction and do it again:

commit;

begin transaction;

select ProductID, UnitPrice
from dbo.Product with (snapshot)
where ProductID = 23;

commit;

This time we get a unit price of 30.00, reflecting the committed price when the new transaction begins.

So that is MVCC in a nutshell.

What about the validations? If one of the MVCC assumptions doesn’t hold up, SQL Server will detect this circumstance by one of several validation methods. In some cases, these validations happen as the transaction progresses, and in cases the validation happens when the transaction commits.

In connection 1, make sure that there are no open transactions, then run the following:

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 30.00
where ProductID = 23;

And in connection 2 we will try a simultaneous update.

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 35.00
where ProductID = 23;

This immediately fails with an error:

Msg 41302, Level 16, State 110, Line 18
The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.
Msg 3998, Level 16, State 1, Line 16
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
The statement has been terminated.

I find the wording on the 41302 error a bit odd because it’s not completely correct, but the upshot is that there are two simultaneous update operations in progress. The “write-write conflict” assumption has failed, so the error is generated. Note that SQL Server does not block the progress of the transaction, as with previous versions, but it does generate an error.

In addition to the 41302 error, we might also get 41303 and 41325 errors. These relate to violations of the repeatable read and serializable isolation level requirements. Make sure that both connections don’t have open transactions, then run this in connection 1:

begin transaction;

select ProductID, UnitPrice
from dbo.Product with (repeatableread)
where ProductID = 23;

Then back in connection 2 change the price:

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 38.00
where ProductID = 23;

commit;

Now try to commit transaction 1:

commit;

At commit time, SQL Server will validate the repeatable read requirement and the following error will be generated.


Msg 41305, Level 16, State 0, Line 25
The current transaction failed to commit due to a repeatable read validation failure.

Finally, we can create the conditions for a 41325 error. Commit any open transactions, then in connection 1 run:

delete dbo.Product where ProductID = 1000;

begin transaction;

select ProductID, UnitPrice
from dbo.Product with (serializable)
where ProductID = 1000;

And in connection 2 run:

insert dbo.Product (ProductID, Description, UnitPrice)
values (1000, 'Product 1000', 1000.00);

Back in connection 1:

commit;


Msg 41325, Level 16, State 0, Line 35
The current transaction failed to commit due to a serializable validation failure.

Whew, that was a lot just to talk about the validations that take place (and there is actually one more validation that happens at commit time, but we’ll address that separately). With that background, we can now talk about error handling in the next post.