MySQL: queries the top n aggregates
I have a table that takes into account the occurrence of one specific action by different users on different objects:
CREATE TABLE `Actions` (
`object_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`actionTime` datetime
);
Each time the user performs this action, a row is inserted. I can count how many actions have been performed on each object and order the objects by "activity":
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count;
How can I limit the results to the top n objects? The LIMIT clause is applied before aggregation, resulting in incorrect results. The table is potentially huge (millions of rows) and I probably need to count dozens of times per minute, so I would like to make it as efficient as possible.
edit . Actually the machine is right and I was wrong with the timing when LIMIT is applied. My query returned the correct results, but the GUI presenting them to me threw me away ... this question makes this question pointless. Sorry!
a source to share
In fact ... LIMIT is applied last after the possible HAVING clause. Therefore, it should not give incorrect results. However, since LIMIT is applied last, it won't make your query run faster, as the temporary table will need to be created and sorted in order of number of steps before chopping the result. Also, don't forget to sort in descending order:
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count DESC
LIMIT 10;
You can try adding an index to the object_id for optimization. Thus, instead of the action table, only the index should be scanned.
a source to share
What about:
SELECT * FROM
(
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count
)
LIMIT 15
Also, if you have some measure of what should be the minimum number of actions to be included (for example, the top n's are definitely over 1000), you can improve efficiency by adding a HAVING clause:
SELECT * FROM
(
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
HAVING action_count > 1000
ORDER BY action_count
)
LIMIT 15
a source to share
I know this thread is 2 years old, but stackflow still thinks this is relevant, so my $ 0.02 goes here. ORDER BY clauses are computationally expensive and should be avoided on large tables. The trick I used (partly from Joe Celko's SQL for Smarties) looks something like this:
SELECT COUNT(*) AS counter, t0.object_id FROM (SELECT COUNT(*), actions.object_id FROM actions GROUP BY id) AS t0, (SELECT COUNT(*), actions.object_id FROM actions GROUP BY id) AS t1 WHERE t0.object_id < t1.object_id GROUP BY object_id HAVING counter < 15
Gives you 15 of the best edited objects without sorting. Note that as of v5 mysql will only cache result sets for exactly repeated queries (whitespace incl), so the subquery will not be cached. Using a view will solve this problem.
Yes, these are three queries instead of two, and the only gain is not sorting the grouped query, but if you have many groups it will be faster.
Side note: the query is really handy for median functions without sorting
a source to share