Uniqueness is based on highest value when not all cells are the same
Let's say that I have the following table of results from my SQL select:
DocumentId CreationDate InstanceId ABC 10th Jan 0c60f4e2-02fc-4244-9ec5-4d259ea5774d ABC 11th Jan 2168ab5d-d6ca-4db3-90f0-b621d72108b8 BCA 4th Jan cb7cdf24-b50f-4bd9-b2b5-d58a14793dd8
Note that it InstanceId
is different for each returned now; it is essentially the primary key for the table.
How do I change my selection so that I only return one row per DocumentId
by selecting "newest" (defined CreationDate
), making sure the InstanceId
row is correct.
So the results above will return:
DocumentId CreationDate InstanceId ABC 11th Jan 2168ab5d-d6ca-4db3-90f0-b621d72108b8 BCA 4th Jan cb7cdf24-b50f-4bd9-b2b5-d58a14793dd8
(btw, sorry for the awful title of the question, feel free to change it to something more appropriate)
a source to share
Example for Oracle:
SELECT DISTINCT
DocumentId,
FIRST_VALUE(CreationDate)
OVER (PARTITION BY DocumentId
ORDER BY CreationDate DESC) AS CreationDate,
FIRST_VALUE(InstanceId)
OVER (PARTITION BY DocumentId
ORDER BY CreationDate DESC) AS InstanceId
FROM mytable;
Obviously, the results will be vague if there are duplicate documents with exactly the same CreationDate.
a source to share
Here is the SQL Server version. Basically you are joining a table with a table obtained by grouping rows by the DocumentId column and getting max (createDate). Using these two columns as a condition to get the Code value. Basically the primary key for the required selection is DocumentName and CreationDate. These are unique (or should uniquely) identify the row you are trying to select. To get this key, we create a second (temporary) table with a select clause and a groupBy. We join this table to the original and use the information for selection.
SELECT
mt2.DocumentId
,mt2.CreationDate
,mt1.InstanceId
FROM
myTable mt1
inner join (SELECT
DocumentId DocumentId
,MAX(CreationDate) CreationDate
FROM
myTable
GROUP BY
DocumentId
)mt2 on mt2.DocumentId = mt1.DocumentId
and mt2.CreationDate = mt1.CreationDate
ORDER BY mt2.DocumentId
The creator must be unique to the DocumentId for the request to run flawlessly. If you require more entries per day for each document, you may consider lowering the CreationDate verbosity (for example, add a time component)
a source to share
This should work for most databases:
SELECT
cur.DocumentId, cur.CreationDate, cur.InstanceId
FROM
DocumentVersions cur
LEFT OUTER JOIN
DocumentVersions next
ON next.DocumentId = cur.DocumentId
AND next.CreationDate > cur.CreationDate
WHERE
next.DocumentId is null
It joins the Document table to itself, looking for a document with the same ID and higher CreationDate. The where clause says that the document with a higher date should not be found, effectively filtering the new document by DocumentId.
If there can be multiple documents with the same creation date, you can choose the one with the highest InstanceId instance, for example:
SELECT
cur.DocumentId, cur.CreationDate, max(cur.InstanceId)
FROM
DocumentVersions cur
LEFT OUTER JOIN
DocumentVersions next
ON next.DocumentId = cur.DocumentId
AND next.CreationDate > cur.CreationDate
WHERE
next.DocumentId is null
GROUP BY
cur.DocumentId, cur.CreationDate
a source to share