Indexing an "undetectable" key for quick lookups?

I am not fully getting everything I want from Google Analytics, so I am creating my own tracking system to fill in some of the gaps.

I have a session key that I am sending to the client as a cookie. This is the GUID. I also have an IDENTITY int surrogate string.

I often have to access the session string to update it throughout the life of the client. Finding this session string to generate updates is my problem.

I am only sending the GUID to the client browser:

a) I do not want my technical "hacker" users to determine that the user is id - they - that is, they know how many visitors we had on the site as a whole

b) I want to make sure no one messes up the data maliciously - no one can guess the GUID

I know GUID indexes are inefficient, but I'm not sure exactly how inefficient they are. I also don't understand how to maximize the efficiency of multiple updates on the same line.

I don't know which of the following I need to do:

  • Index the GUID column and always use that to find the row
  • Scan the table to find a row based on the GUID (assuming recent sessions are easy to find). Do it in reverse order of the date (if possible!)
  • Avoid the GUID index and keep a hash table in my application tier of active sessions: IDictionary<GUID, int>

    to resolve the surrogate IDENTITY "secret" key found from the "off-network" GUID key.

There can be several thousand sessions per day.

PS. I'm just trying to better understand the SQL aspects of this. I know I can do other clever tings like just writing to the table on session expiration etc, but please keep the SQL / index related answers.

+1


a source to share


3 answers


In this case, I just create an index on the GUID. Thousands of sessions a day is an absolutely trivial load for a modern database.

Some notes:



  • If you create a GUID index as nonclustered, the index will be small and likely to be cached in memory. By default, most databases cluster on a primary key.
  • The GUID column is larger than an integer. But this is hardly a big problem these days. And you need a GUID for the app.
  • The GUID index is the index of the string, such as Last Name. It works effectively.
  • A B-tree index on a GUID is more difficult to balance than an index on an identity column. (But no more complex than an index by last name.) This effect can be dealt with by starting with a low fill factor and reorganizing the index in a weekly assignment. This is a micro-optimization for databases that handle millions of inserts per hour or more.
+4


a source


Assuming you are using SQL Server 2005 or higher, your script can benefit from the NEWSEQUENTIALID () function, which provides ordered GUIDs.

Consider this quote from Performance Comparison article - Identity () x NewId () x NewSequentialId

"The NEWSEQUENTIALID system function is an add-on to SQL Server 2005. It addresses some of the conflicting requirements in SQL Server 2000, namely ID-level insert performance and globally unique values."

Declare the table as



create table MyTable( 
   id uniqueidentifier default newsequentialid() not null primary key clustered
  ); 

      

Keep in mind, however, as Andomar noted that the sequence of the GUIDs released also makes them easy to predict. There are trickier ways to do this, but not what would do it better than applying the same techniques to sequential integer keys.

Like others, I seriously doubt that the overhead of using direct GUIDs () would be significant enough for your application to notice. You are better off focusing on minimizing your database accesses than implementing custom caching scenarios like the dictionary you suggest.

+2


a source


If I understand what you are asking, are you worried that indexing and searching your users by hashing their GUIDs might slow down your application? I'm with Andomar, it hardly matters unless you're inserting rows so fast that updating the index slows things down. Only something like a logging table can happen, and then only for complex pointers.

More importantly, did you make the profile first ? You don't have to guess why your program is slow, you can find out which bits are slow with the profiler. Otherwise, you'll be wasting hours optimizing bits of code that are either A) never used or B) are already fast enough.

+1


a source







All Articles