Fast way to get 2 columns difference in MySQL
how do I get column fields that don't match in another column?
I tried:
SELECT table1.page_title
FROM table1, table2
WHERE table1.page_title != table2.page_title
It creates a lot of repeating fields, so I did:
SELECT DISTINCT table1.page_title
FROM table1, table2
WHERE table1.page_title != table2.page_title
but it just hangs.
Any help would be greatly appreciated, thanks!
PS I am doing this to create an exception list for the MediaWiki MWDumper tool. I need to ensure that my current wiki entries will not be overwritten when importing the sql output.
EDIT: Yes, they are two different tables. Each of them has about 70,000 records. Also why are my queries slowing down? I would appreciate it if someone could clarify the situation so that I can find out why :) Thanks again!
a source to share
Are a and b different tables that have a "page_title" column?
If so, try this:
SELECT DISTINCT page_title FROM a
WHERE page_title NOT IN (SELECT page_title FROM b)
If all you are interested in is removing duplicates (if you only have one table), then there are several ways to do this, two of which are:
SELECT DISTINCT page_title FROM a
or
SELECT page_title FROM a
GROUP BY page_title
The GROUP BY option is stronger, albeit slower - you can add a HAVING clause to select only the headers that appear, for example. more than doubled:
SELECT page_title FROM a
GROUP BY page_title
HAVING COUNT(page_title) > 2
Hope it helps
(Thanks to Aaron F for the comment)
a source to share
You can try self-learning which I have used in the past, but I am not sure if it will be faster as I am not using MySQL. This page might give you some insight: http://www.xaprb.com/blog/2006/10/11/how-to-delete-duplicate-rows-with-sql/
a source to share
Minor improvement on Rax's answer:
SELECT DISTINCT a.page_title FROM a
WHERE a.page_title NOT IN (SELECT DISTINCT b.page_title FROM b)
Do your tables have an index on the page_title column? What does the explain plan say for your requests?
I can't imagine you need an index anyway, given only 70k rows in a table.
a source to share