Get only last row grouped by column
I have a large collection of emails and status codes being sent.
ID Recipient Date Status
1 someone@example.com 01/01/2010 1
2 someone@example.com 02/01/2010 1
3 them@example.com 01/01/2010 1
4 them@example.com 02/01/2010 2
5 them@example.com 03/01/2010 1
6 others@example.com 01/01/2010 1
7 others@example.com 02/01/2010 2
In this example:
- all emails sent to someone have status 1
- the average email (by date) sent by it has status 2 , but the last one is 1
- the last email sent to others has status 2
What I need to get is a count of all emails sent to each person and what the last status code was .
The first part is pretty simple:
SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
ORDER BY Recipient
Which gives me:
Recipient EmailCount
someone@example.com 2
them@example.com 3
others@example.com 2
How can I get the latest status code?
The end result should be:
Recipient EmailCount LastStatus
someone@example.com 2 1
them@example.com 3 1
others@example.com 2 2
Thanks.
(Microsoft SQL Server 2008, query is done via OleDbConnection in .Net)
a source to share
This is an example of a "max per group" query. I think this is easiest to understand by splitting it into two subqueries and then concatenating the results.
The first subquery is what you already have.
The second subquery uses the windowing ROW_NUMBER function for the email number for each recipient, starting at 1 for the most recent, then 2, 3, etc.
The results of the first query are then concatenated with the result of the second query, which has line number 1, i.e. latest. Doing it this way will ensure that you only get one row per recipient in case there are relationships.
Here is the request:
SELECT T1.Recipient, T1.EmailCount, T2.Status FROM
(
SELECT Recipient, COUNT(*) AS EmailCount
FROM Messages
GROUP BY Recipient
) T1
JOIN
(
SELECT
Recipient,
Status,
ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn
FROM Messages
) T2
ON T1.Recipient = T2.Recipient AND T2.rn = 1
This gives the following results:
Recipient EmailCount Status
others@example.com 2 2
someone@example.com 2 1
them@example.com 3 1
a source to share
It's not pretty, but I would just use multiple subqueries:
SELECT Recipient,
COUNT(*) EmailCount,
(SELECT Status
FROM Messages M2
WHERE Recipient = M.Recipient
AND Date = (SELECT MAX(Date)
FROM Messages
WHERE Recipient = M2.Recipient))
FROM Messages M
GROUP BY Recipient
ORDER BY Recipient
a source to share
SELECT
M.Recipient,
C.EmailCount,
M.Status
FROM
(
SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
) C
JOIN
(
SELECT Recipient, MAX(Date) AS LastDate
FROM Messages
GROUP BY Recipient
) MD ON C.Recipient = MD.Recipient
JOIN
Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date
ORDER BY
Recipient
I've found that aggregates generally scale better than ranking functions
a source to share
You cannot easily do this, this is the only query, because count (*) is a group function whereas the last status comes from a specific string. Here's a request to get the latest status for each user:
SELECT M.Recipient, M.Status FROM Messages M
WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB
WHERE SUB.Recipient = M.Recipient)
a source to share
You can use the ranking functions for this. Something like (not tested):
WITH MyResults AS
(
SELECT Recipient, Status, ROW_NUMBER() OVER( Recipient ORDER BY ( [date] DESC ) ) AS [row_number]
FROM Messages
)
SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status
FROM (
SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
) MyCounts
INNER JOIN MyResults
ON MyCounts.Recipient = MyResults.Recipient
WHERE MyResults.[row_number] = 1
a source to share