Uniqueness is based on highest value when not all cells are the same

Question

Uniqueness is based on highest value when not all cells are the same

Let's say that I have the following table of results from my SQL select:

DocumentId CreationDate InstanceId
ABC 10th Jan 0c60f4e2-02fc-4244-9ec5-4d259ea5774d
ABC 11th Jan 2168ab5d-d6ca-4db3-90f0-b621d72108b8
BCA 4th Jan cb7cdf24-b50f-4bd9-b2b5-d58a14793dd8

Note that it InstanceId

is different for each returned now; it is essentially the primary key for the table.

How do I change my selection so that I only return one row per DocumentId

by selecting "newest" (defined CreationDate

), making sure the InstanceId

row is correct.

So the results above will return:

DocumentId CreationDate InstanceId
ABC 11th Jan 2168ab5d-d6ca-4db3-90f0-b621d72108b8
BCA 4th Jan cb7cdf24-b50f-4bd9-b2b5-d58a14793dd8

(btw, sorry for the awful title of the question, feel free to change it to something more appropriate)

+1

sql

SCdF May 21 '09 at 5:49 am

a source to share

3 answers

Jeffrey kemp · Answer 1 · 2009-05-21T06:01:48+0000

Example for Oracle:

SELECT DISTINCT
       DocumentId,
       FIRST_VALUE(CreationDate)
          OVER (PARTITION BY DocumentId
                ORDER BY CreationDate DESC) AS CreationDate,
       FIRST_VALUE(InstanceId)
          OVER (PARTITION BY DocumentId
                ORDER BY CreationDate DESC) AS InstanceId
FROM   mytable;

Obviously, the results will be vague if there are duplicate documents with exactly the same CreationDate.

AlexDrenea · Answer 2 · 2009-05-21T08:25:33+0000

Here is the SQL Server version. Basically you are joining a table with a table obtained by grouping rows by the DocumentId column and getting max (createDate). Using these two columns as a condition to get the Code value. Basically the primary key for the required selection is DocumentName and CreationDate. These are unique (or should uniquely) identify the row you are trying to select. To get this key, we create a second (temporary) table with a select clause and a groupBy. We join this table to the original and use the information for selection.

SELECT
     mt2.DocumentId
    ,mt2.CreationDate
    ,mt1.InstanceId
FROM
    myTable    mt1
    inner join (SELECT 
                     DocumentId  DocumentId
                    ,MAX(CreationDate)  CreationDate
             FROM       
                     myTable
             GROUP BY 
                     DocumentId
               )mt2  on  mt2.DocumentId = mt1.DocumentId 
                     and mt2.CreationDate = mt1.CreationDate
ORDER BY mt2.DocumentId

The creator must be unique to the DocumentId for the request to run flawlessly. If you require more entries per day for each document, you may consider lowering the CreationDate verbosity (for example, add a time component)

Andomar · Answer 3 · 2009-05-21T08:26:57+0000

This should work for most databases:

SELECT
    cur.DocumentId, cur.CreationDate, cur.InstanceId
FROM
    DocumentVersions cur
LEFT OUTER JOIN
    DocumentVersions next
    ON next.DocumentId = cur.DocumentId
    AND next.CreationDate > cur.CreationDate 
WHERE
    next.DocumentId is null

It joins the Document table to itself, looking for a document with the same ID and higher CreationDate. The where clause says that the document with a higher date should not be found, effectively filtering the new document by DocumentId.

If there can be multiple documents with the same creation date, you can choose the one with the highest InstanceId instance, for example:

SELECT
    cur.DocumentId, cur.CreationDate, max(cur.InstanceId)
FROM
    DocumentVersions cur
LEFT OUTER JOIN
    DocumentVersions next
    ON next.DocumentId = cur.DocumentId
    AND next.CreationDate > cur.CreationDate 
WHERE
    next.DocumentId is null
GROUP BY
    cur.DocumentId, cur.CreationDate

Uniqueness is based on highest value when not all cells are the same

More articles: