Get only last row grouped by column

I have a large collection of emails and status codes being sent.

ID Recipient           Date       Status
 1 someone@example.com 01/01/2010      1
 2 someone@example.com 02/01/2010      1
 3 them@example.com    01/01/2010      1
 4 them@example.com    02/01/2010      2
 5 them@example.com    03/01/2010      1
 6 others@example.com  01/01/2010      1
 7 others@example.com  02/01/2010      2

      

In this example:

  • all emails sent to someone have status 1
  • the average email (by date) sent by it has status 2 , but the last one is 1
  • the last email sent to others has status 2

What I need to get is a count of all emails sent to each person and what the last status code was .

The first part is pretty simple:

SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
ORDER BY Recipient

      

Which gives me:

Recipient           EmailCount
someone@example.com 2
them@example.com    3
others@example.com  2

      

How can I get the latest status code?

The end result should be:

Recipient           EmailCount LastStatus
someone@example.com          2          1
them@example.com             3          1
others@example.com           2          2

      

Thanks.

(Microsoft SQL Server 2008, query is done via OleDbConnection in .Net)

+2


a source to share


5 answers


This is an example of a "max per group" query. I think this is easiest to understand by splitting it into two subqueries and then concatenating the results.

The first subquery is what you already have.

The second subquery uses the windowing ROW_NUMBER function for the email number for each recipient, starting at 1 for the most recent, then 2, 3, etc.

The results of the first query are then concatenated with the result of the second query, which has line number 1, i.e. latest. Doing it this way will ensure that you only get one row per recipient in case there are relationships.



Here is the request:

SELECT T1.Recipient, T1.EmailCount, T2.Status FROM
(
    SELECT Recipient, COUNT(*) AS EmailCount
    FROM Messages
    GROUP BY Recipient
) T1
JOIN
(
    SELECT
        Recipient,
        Status,
        ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn
    FROM Messages
) T2
ON T1.Recipient = T2.Recipient AND T2.rn = 1

      

This gives the following results:

Recipient            EmailCount  Status  
others@example.com   2           2       
someone@example.com  2           1       
them@example.com     3           1       

      

+4


a source


It's not pretty, but I would just use multiple subqueries:



SELECT Recipient,
    COUNT(*) EmailCount,
    (SELECT Status
     FROM Messages M2
     WHERE Recipient = M.Recipient
         AND Date = (SELECT MAX(Date)
                     FROM Messages
                     WHERE Recipient = M2.Recipient))
FROM Messages M
GROUP BY Recipient
ORDER BY Recipient

      

+2


a source


SELECT
    M.Recipient,
    C.EmailCount,
    M.Status
FROM
    (
    SELECT Recipient, Count(*) EmailCount
    FROM Messages
    GROUP BY Recipient
    ) C
    JOIN
    (
    SELECT Recipient, MAX(Date) AS LastDate
    FROM Messages
    GROUP BY Recipient
    ) MD ON C.Recipient = MD.Recipient
    JOIN
    Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date
ORDER BY
    Recipient

      

I've found that aggregates generally scale better than ranking functions

+2


a source


You cannot easily do this, this is the only query, because count (*) is a group function whereas the last status comes from a specific string. Here's a request to get the latest status for each user:

SELECT M.Recipient, M.Status FROM Messages M
WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB
    WHERE SUB.Recipient = M.Recipient)

      

+1


a source


You can use the ranking functions for this. Something like (not tested):

WITH MyResults AS
(
   SELECT Recipient, Status, ROW_NUMBER() OVER( Recipient ORDER BY (  [date] DESC ) ) AS   [row_number]
   FROM Messages
)
SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status
FROM (
    SELECT Recipient, Count(*) EmailCount
    FROM Messages
    GROUP BY Recipient
) MyCounts
INNER JOIN MyResults
ON MyCounts.Recipient = MyResults.Recipient
WHERE MyResults.[row_number] = 1

      

0


a source







All Articles