SQL Server Performance and Clustered Index Values

Question

SQL Server Performance and Clustered Index Values

I have a table myTable with a unique clustered index myId with a fill factor of 100% Its a zero-based integer (but not an identity column for the table) I need to add a new row type to the table. It would be nice if I could distinguish these lines using negative myId values.

Do negative values have extra page separators and slow down inserts?

Additional background: This table exists as part of etl for data warehouse, which collects data from disparate systems. Now I want to place a new datatype. For me to do this, to reserve negative IDs for this new data, which will thus be automatically grouped. This will also avoid major key changes or additional columns in the schema.

Summary of the answer: Fill factors of 100% will normally slow down inserts. But not inserts that happen sequentially, and this includes temporary negative inserts.

0

performance sql-server clustered-index

cindi May 25 '09 at 14:23

a source to share

5 answers

It is not enough to notice for any sane system.

Page splitting occurs when the page is full, either at the beginning or at the end of the range. As long as you maintain the index regularly ...

Edit, after the comments of the Fill Factor:

After breaking the page into 90 or 100 FF, each page will be 50% full. FF = 100 means the insert will happen earlier (possibly the first insert).

With a strictly monotonically increasing (or decreasing) key (+ ve or -ve), page splitting occurs at both ends of the range.

However from BOL FILLFACTOR

Fill

Adding data to the end of the table

A nonzero fill factor value other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. However, if all data is appended to the end of the table, the empty space in the page index will not be filled. For example, if the indexed column is the ID column, the key for new rows is always incremented and the index rows are logically appended to the end of the index. If existing rows are going to be updated with data that lengthens the rows, use a fill factor of less than 100. The extra bytes on each page will help minimize page splitting caused by extra length in the rows.

So the fillfactor matter for strictly monotonic keys ...? Especially if it is a low volume writes

+2

gbn May 25 '09 at 14:26

a source to share

No, absolutely not. Negative values are as true as INTEGERS as positive values. No problems. Basically, internally they are just 4 bytes of zeros and ones :-)

Mark

+1

marc_s May 25 '09 at 14:31

a source to share

You are asking the wrong question!

If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted, or even modified, page splits can occur because there is no room on the current index data page to write the change.

Even with regular maintenance of the indexes, the fill factor of 100% is the manufacturer on the table where you know the inserts will be done. A more common value would be 90%.

+1

Mitch wheat May 25 '09 at 15:15

a source to share

I am concerned that this post may have been mistakenly modified as there seems to be an underlying design issue here, regardless of the resulting page splits.

Why do you need to enter a negative ID?

Integer primary key, for example, must unambiguously indicate a string, the sign must be out of date. I suspect there might be a problem defining the primary key for your table if it is not.

If you need to flag / identify newly inserted records, then create a column specifically for this purpose.

This solution would be ideal because then you can ensure that your primary key is consistent (possibly using the Identity datatype, albeit not essential), thereby avoiding pagination (on insert) issues altogether.

Also, to confirm if I can, a 100% fill factor for a clustered index primary key (like integer IDs) will not break pages for sequential inserts!

+1

John Sansom May 25 '09 at 17:54

a source to share

Remus Rusanu · Accepted Answer · 2009-05-25T23:32:35+0000

Apart from the practical administration points you already got, and the suspiciously dubious use of negative IDs to represent attributes of the data model, there is also the right question here: specify a table with ints from 0 to N by inserting new negative values wherever those values go and will do they cause additional splitting?

The original lines will be placed on the clustered pages of the index sheet, the line with id 0 on the first page and the line with id N on the last page, filling the pages in between. When the first row is inserted with a value of -1 it will sort the leading row with id 0 and, as such, will add a new page to the tree (actually select 8 pages, but that's a different point) and link the page before the sheet linked list of pages. This will NOT split the page on the first page. On subsequent insertions of values -2, -3, etc. They will go to the same new page and they will be inserted in the correct position (-2 in front of -1, -3 in front of -2, etc.) until the page is full. Further inserts will add a new page ahead that will match the new new values. Inserts positive N + 1 values,N + 2 will show up on the last page and put in it until it is full, then they will add a new page and start filling that page.

So basically the answer is: inserts at either end of the clustered index should not cause paging. Splitting pages can only be caused by insertions between two existing keys. This actually extends to non-sheet pages as well, the index on both ends of the cluster cannot split a non-page page either. I am not discussing the impact of course updates here (they can break if you increase the length of a variable length column).

There has been a lot of talk in the SQL Server blogosphere lately about potential page section performance issues, but I must warn you against unnecessary extremes to avoid them. Splitting pages is a normal index operation. If you find yourself in an environment where the performance hit is paginated during inserts, you are likely to suffer worse from mitigation measures because you create artificial hotspots for the latches that are much worse than they affect each insert. What doesis that long-term work with frequent splits will lead to high fragmentation, which affects the data access time. What I am saying is best mitigated with a periodic index update operation (reorganization). Avoid premature optimizations, always measure first.

SQL Server Performance and Clustered Index Values

More articles: