MySQL multi-dependent subqueries, painfully slow
I have a working query that fetches the data I need, but unfortunately it is very slow (takes 3 minutes). I have the indexes in place, but I think the problem is with multiple dependent subqueries. I've tried to rewrite the query using joins, but I can't seem to get it to work. Any help would be greatly appreciated.
Tables:
Basically, I have 2 tables. The former (prices) contain the prices of items in the store. Each line is the item's price per day, and new lines are added every day with an updated price.
The second table (watches_US) contains position information (name, description, etc.).
CREATE TABLE `prices` (
`prices_id` int(11) NOT NULL auto_increment,
`prices_locale` enum('CA','DE','FR','JP','UK','US') NOT NULL default 'US',
`prices_watches_ID` char(10) NOT NULL,
`prices_date` datetime NOT NULL,
`prices_am` varchar(10) default NULL,
`prices_new` varchar(10) default NULL,
`prices_used` varchar(10) default NULL,
PRIMARY KEY (`prices_id`),
KEY `prices_am` (`prices_am`),
KEY `prices_locale` (`prices_locale`),
KEY `prices_watches_ID` (`prices_watches_ID`),
KEY `prices_date` (`prices_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=61764 ;
CREATE TABLE `watches_US` (
`watches_ID` char(10) NOT NULL,
`watches_date_added` datetime NOT NULL,
`watches_last_update` datetime default NULL,
`watches_title` varchar(255) default NULL,
`watches_small_image_height` int(11) default NULL,
`watches_small_image_width` int(11) default NULL,
`watches_description` text,
PRIMARY KEY (`watches_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The request returns the last 10 price changes within 30 hours, ordered by price change size. So I have sub queries to get the newest price, oldest price for 30 hours, and then calculate the price change.
Here's the request:
SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID,
( SELECT prices_am FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price,
( SELECT prices_date FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price_date,
( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price,
( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change,
( SELECT (new_price - old_price) ) as absolute_change
FROM watches_US
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID
WHERE ( prices_locale = 'US' )
AND ( prices_am IS NOT NULL )
AND ( prices_am != '' )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
AND ( prices.prices_date = new_price_date )
ORDER BY absolute_change ASC
LIMIT 10
How would I rewrite this to use joins instead, or else optimize this so it doesn't take more than 3 minutes to get the result? Any help would be greatly appreciated!
Thank you.
UPDATE
Using the answers below, I got a request before, which takes 2 seconds to run:
SELECT watches_US.*, prices.*,
( SELECT prices_am FROM prices prices2 WHERE ( prices2.prices_watches_ID = watches_US.watches_ID AND prices2.prices_locale = 'US') AND ( prices2.prices_date >= DATE_SUB(prices.prices_date,INTERVAL 30 HOUR) ) ORDER BY prices2.prices_date ASC LIMIT 1 ) as old_price,
( SELECT ROUND(((prices.prices_am - old_price)/old_price)*100,2) ) as percent_change,
( SELECT (prices.prices_am - old_price) ) as absolute_change
FROM watches_US
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID AND prices.prices_locale = 'US'
WHERE ( prices.prices_am IS NOT NULL )
AND ( prices.prices_am != '' )
AND ( prices.prices_date IN (SELECT MAX(prices_date) FROM prices WHERE prices_watches_ID = watches_US.watches_ID AND prices_locale = 'US' ) )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
ORDER BY absolute_change ASC
LIMIT 10
It might still be able to work with some work, but it can be used as is. Thanks everyone for your help!
a source to share
Here's a partial idea:
SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID,
prices2.prices_am as new_price,
prices2.prices_date as new_price_date,
( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price,
( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change,
( SELECT (new_price - old_price) ) as absolute_change
FROM watches_US
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID
LEFT OUTER JOIN prices prices2 ON prices2.prices_watches_ID = watches_US.watches_ID
WHERE ( prices_locale = 'US' )
AND ( prices_am IS NOT NULL )
AND ( prices_am != '' )
AND ( prices2.prices_date IN (SELECT MAX(price_date) FROM prices WHERE prices_watches_ID = watches_US.watches_ID AND prices_locale = 'US' )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
AND ( prices.prices_date = new_price_date )
ORDER BY absolute_change ASC
LIMIT 10
The changes are the second pricing join used to get new_price and new_price_date with a WHERE clause to select only the most recent record. Perhaps you could clean it up a bit, but I wanted to get it.
a source to share
There are several problems with this SQL:
-
You are running the same query multiple times:
(SELECT prices_am FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1) as new_price, (SELECT prices_date FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT
You only have to run the query once, give it a name, and select multiple columns, eq SELECT ... sub1.prices_am, sub1.prices_date FROM ... SELECT () sub1
If I'm not mistaken.
- Do not use for any reason
HAVING
. It kills your performance as it forces the database to fetch all the rows in your query and then filter some of them as described in the descriptionHAVING
.
a source to share