Optimize GROUP BY & ORDER BY query

I have a web page where users download and watch videos. Last week, I asked what the best way is to track video views so that I can display the most viewed videos this week (videos from all dates).

Now I need help optimizing the query with which I get the video from the database. Relevant tables:

video (~239371 rows)
VID(int), UID(int), title(varchar), status(enum), type(varchar), is_duplicate(enum), is_adult(enum), channel_id(tinyint)

signup (~115440 rows)
UID(int), username(varchar)

videos_views (~359202 rows after 6 days of collecting data, so this table will grow rapidly)
videos_id(int), views_date(date), num_of_views(int)

      

The table video

contains videos, signup

supports users, and videos_views

contains data about video views (each video can contain one row per day in this table).

I have this query that gets the job done, but it takes ~ 10 seconds to complete, and I think it will only get worse over time as the table grows videos_views

.

SELECT
 v.VID, 
 v.title, 
 v.vkey, 
 v.duration, 
 v.addtime, 
 v.UID, 
 v.viewnumber, 
 v.com_num, 
 v.rate, 
 v.THB, 
 s.username,
 SUM(vvt.num_of_views) AS tmp_num
FROM
 video v
  LEFT JOIN videos_views vvt ON v.VID = vvt.videos_id
  LEFT JOIN signup s on v.UID = s.UID
WHERE
 v.status = 'Converted'
 AND v.type = 'public'
 AND v.is_duplicate = '0'
 AND v.is_adult = '0'
 AND v.channel_id <> 10
 AND vvt.views_date >= '2001-05-11'
GROUP BY
 vvt.videos_id
ORDER BY
 tmp_num DESC
LIMIT
 8

      

All relevant fields are indexed. And here is a screenshot of the EXPLAIN result: alt text

So how can I optimize this?

UPDATE This is my request based on Quassnoi's answer. It returns correct videos, but it messes up the JOIN in the registration table. For some records the field username

is NULL, for others it contains an invalid username.

SELECT
    v.VID,
    v.title,
    v.vkey,
    v.duration,
    v.addtime,
    v.UID,
    v.viewnumber,
    v.com_num,
    v.rate,
    v.THB,
    s.username
FROM
    (SELECT
        videos_id,
        SUM(num_of_views) AS tmp_num
    FROM
        videos_views
    WHERE
        views_date >= '2010-05-13'
    GROUP BY
        videos_id
    ) q
        JOIN video v ON v.VID = q.videos_id
        LEFT JOIN signup s ON s.UID = v.VID
WHERE
    v.type = 'public'
    AND v.channel_id <> 10
    AND v.is_adult = '0'
    AND is_duplicate = '0'
ORDER BY
    tmp_num DESC
LIMIT
    8

      

Here's the result: alt text

+2


a source to share


2 answers


Create the following index:

video_views (views_date, videos_id)

      

and get rid of LEFT JOIN

between videos

and views

(it doesn't work with your current request anyway):

SELECT  *
FROM    (
        SELECT  videos_id, SUM(num_of_views) AS tmp_num
        FROM    video_views
        GROUP BY
                videos_id
        ) q
JOIN    videos v
ON      v.vid = q.videos_id
LEFT JOIN
        signup s
ON      s.UID = v.UID
ORDER BY
        tmp_num DESC
LIMIT 8

      



If you want to return zero for a video that has never been viewed, change the order of the fields in the index:

video_views (videos_id, views_date)

      

and rewrite the request:

SELECT  *,
        (
        SELECT  COALESCE(SUM(num_of_views), 0)
        FROM    video_views vw
        WHERE   vw.videos_id = v.vid
                AND views_date >= '2001-05-11'
        ) AS tmp_num
FROM    videos v
LEFT JOIN
        signup s
ON      s.UID = v.UID
ORDER BY
        tmp_num DESC
LIMIT 8

      

+1


a source


Yes, ORDER BY

the calculated column will always be irreversible. Unfortunately.



If you are going to make this request a lot and need to avoid watching every video that needs to be counted and ordered every time, you will have to denormalize. Add a column views_in_last_week

, recalculate it from videos_views

in the background every day, and index it (possibly in a composite index with other matching WHERE clauses).

+2


a source







All Articles