Aug 20, 2007

Performance Differences with the new SQL 2008 Insert Statement

I had a question from Sanchet on the new insert statement syntax in SQL 2008 and the relative performance difference between the two methods.

For instance if we take a look at the following statements, they both have the same net result with the difference that the first statement is a single statement using the multiple row insert syntax, while the second is the way things are typically done.

insert into humanresources.employeepayhistory
values
(1, getdate(), 100, 1, getdate()),
(2, getdate(), 100, 1, getdate()),
(3, getdate(), 100, 1, getdate()),
(4, getdate(), 100, 1, getdate())

insert into humanresources.employeepayhistory values (1, getdate(), 100, 1, getdate())
insert into humanresources.employeepayhistory values (2, getdate(), 100, 1, getdate())
insert into humanresources.employeepayhistory values (3, getdate(), 100, 1, getdate())
insert into humanresources.employeepayhistory values (4, getdate(), 100, 1, getdate())

Having a look at the difference between the two is very interesting. I ran both queries on the sample AdventureWorks database (inside a begin/rollback transaction so I could work from a consistent base point) using SQL 2008 CTP4 (July 2007)and looked at the execution plans and the client statistics to see what the difference is on something simple like this.

Execution plans:


The two pictures below show the differences between the execution plans. As you can see the individual inserts are executed singly and has a TRIVIAL optimisation level applied. The cost of each statement is 0.0132898, so the 4 statements will have a total cost of 0.0531592

The multi-row insert version is quite different. It has an optimisation level of FULL and the whole statement has a cost of 0.0187608. A cost reduction of 0.0343984 (or about 65%!). That's seriously significant when we're talking about such a small amount of data.
You can also see that there is a lot more work being done earlier in the execution plan (apologies for the cramped visuals) which is where the real savings are occurring.

Here's the visual difference between the two plans (click for a larger view):

[Multi-row Insert]

[Multiple Inserts]


Client Statistics:


The client statistics are also very interesting. The single statement (multi-row inserts) has far fewer selects and inserts and this is obviously where the major savings occur, though the overall time saving of 40% was also a surprise.

RunSingle StatementSeparate Statements
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements28
Rows affected by INSERT, DELETE, or UPDATE statements44
Number of SELECT statements14
Rows returned by SELECT statements 14
Number of transactions11
Network Statistics
Number of server roundtrips33
TDS packets sent from client33
TDS packets received from server1124
Bytes sent from client626934
Bytes received from server 3333888924
Time Statistics
Client processing time3050
Total execution time3050
Wait time on server replies00

The moral: When inserting multiple rows into the database use the new multi-row insert syntax rather than the traditional method of having multiple insert statements.

[Note: CTP performance is always a work in progress and performance information here will likely be improved upon in the final RTM release]

1 comment:

  1. The 'getdate()' function is only evaluated once in the multi-case, resuklting in the same date values in the 4 rows, but 4 times in the single case, resulting in 4 different values.

    ReplyDelete