Hekaton Part 5: Native Procedures

One in a series of posts about Microsoft’s In-Memory OLTP Functionality

Now we come to one of the Hekaton components that is really quite cool but that I haven’t found a practical use for. Not even once.

Natively compiled stored procedures.

The cool part is that they can be really, really fast. They can be orders of magnitude faster than traditional interpreted procedures. The not so cool part is that they support only a very, very limited subset of T-SQL.

Let’s demonstrate. First, we’ll create a new table and add a couple of rows.

if exists (select * from sys.tables where name = 'Product')
	drop table dbo.Product;
create table dbo.Product
(
	ProductID int not null,
	Description nvarchar(500) not null,
	UnitPrice money not null,

	primary key nonclustered hash (ProductID) with (bucket_count = 1048576),
	index idx_Product__UnitPrice nonclustered (UnitPrice)
) with (memory_optimized = on, durability = schema_and_data);

-- Populate a few rows into the table

insert dbo.Product (ProductID, Description, UnitPrice)
values (1, 'Gizmo', 34.00),
	(2, 'Whatzit', 16.00);

Now can create a natively compiled stored procedure to update the table. This isn’t going to demonstrate speed, but it’s just to show the general idea (and the general awkwardness of the syntax).

if exists (select * from sys.objects where type = 'P' and name = 'usp_UpdateProduct')
	drop procedure usp_UpdateProduct
go
create procedure dbo.usp_UpdateProduct (@ProductID int, @NewUnitPrice money)
with native_compilation, schemabinding, execute as owner
as
begin atomic with (transaction isolation level = snapshot, language = 'us_english')
	update dbo.Product
	set UnitPrice = @NewUnitPrice
	where ProductID = @ProductID;
end
go

We can demonstrate that the procedure really works by running something like.

select ProductID, Description, UnitPrice from dbo.Product where ProductID = 1;
exec dbo.usp_UpdateProduct @ProductID = 1, @NewUnitPrice = 42.00;
select ProductID, Description, UnitPrice from dbo.Product where ProductID = 1;

Now let’s write a procedure to add a bunch of new rows. First, we will create and populate a tally table.

create table Tally
(
    Number int not null,
    constraint PK_Tally primary key nonclustered
        (Number)
) with (memory_optimized = on, durability = schema_and_data);

insert Tally (Number)
select top 1000000 row_number() over (order by v1.number)
from master.dbo.spt_values v1 cross join master.dbo.spt_values v2;

And then try this for a stored procedure:

if exists (select * from sys.objects where type = 'P' and name = 'usp_CreateNewRecords')
	drop procedure usp_CreateNewRecords
go
create procedure dbo.usp_CreateNewRecords (@RecordCount int)
with native_compilation, schemabinding, execute as owner
as
begin atomic with (transaction isolation level = snapshot, language = 'us_english')
	declare @firstId int;
	select @firstId = isnull(max(ProductID), 0) from dbo.Product;

	insert	dbo.Product (ProductID, Description, UnitPrice)
	select	t.Number + @firstId ProductID,
			'Product ' + cast(t.Number + @firstId as nvarchar(500)) Description,
			cast(t.Number + @firstId as money) UnitPrice
	from	dbo.Tally t
	where	t.Number <= @RecordCount;
end
go

Then we’ll run the procedure to add 1,000,000 records:

exec dbo.usp_CreateNewRecords @RecordCount = 1000000;

This goes pretty fast on my machine, consistently adding the 1M records in 5 to 6 seconds.

I wrote the procedure the way I did because I’ve always been taught (any experience has validated) that set-based logic is almost always superior to row-by-row processing. Hekaton definitely changes that paradigm. Let’s rewrite that procedure to do one-off inserts.

if exists (select * from sys.objects where type = 'P' and name = 'usp_CreateNewRecords')
	drop procedure usp_CreateNewRecords
go
create procedure dbo.usp_CreateNewRecords (@RecordCount int)
with native_compilation, schemabinding, execute as owner
as
begin atomic with (transaction isolation level = snapshot, language = 'us_english')
	declare @firstId int;
	select @firstId = isnull(max(ProductID), 0) from dbo.Product;

	declare @counter int = 1;
	while @counter <= @RecordCount
	begin
		insert dbo.Product (ProductID, Description, UnitPrice)
		values (@firstId + @counter,
			'Product ' + cast(@firstId + @counter as nvarchar(500)),
			cast(@firstId + @counter as money));
		select @counter = @counter + 1;
	end
end
go

Now that insert runs in 3 to 4 seconds. Not a dramatic difference, but intriguing to me because it seems to be a game-changer.

Finally, the limitations. This is why I’ve not found natively compiled procedures to be terribly compelling, because I am constantly running up against some limitation. The Microsoft master In-Memory Limitations page lists them. Scroll down to the Natively Compiled Stored Procedures section, which occupies the majority of the page, and you’ll see the long list. I won’t repeat them all here, but the big ones for me are lack of support for CTEs and subqueries, CASE statements, temp tables, non-BIN2 limits, lots of unsupported join types, no IN or OR or LIKE, no DISTINCT and limitations on aggregations, no ROW_NUMBER(), and of course no references to disk-based tables.

Whew! Even my abbreviated list is pretty long!