Fast way to get 2 columns difference in MySQL

how do I get column fields that don't match in another column?

I tried:

SELECT table1.page_title 
FROM table1, table2 
WHERE table1.page_title != table2.page_title

      

It creates a lot of repeating fields, so I did:

SELECT DISTINCT table1.page_title 
FROM table1, table2 
WHERE table1.page_title != table2.page_title

      

but it just hangs.

Any help would be greatly appreciated, thanks!

PS I am doing this to create an exception list for the MediaWiki MWDumper tool. I need to ensure that my current wiki entries will not be overwritten when importing the sql output.

EDIT: Yes, they are two different tables. Each of them has about 70,000 records. Also why are my queries slowing down? I would appreciate it if someone could clarify the situation so that I can find out why :) Thanks again!

+1


a source to share


4 answers


Are a and b different tables that have a "page_title" column?

If so, try this:

SELECT DISTINCT page_title FROM a
WHERE page_title NOT IN (SELECT page_title FROM b)

      

If all you are interested in is removing duplicates (if you only have one table), then there are several ways to do this, two of which are:

SELECT DISTINCT page_title FROM a

      

or



SELECT page_title FROM a
GROUP BY page_title

      

The GROUP BY option is stronger, albeit slower - you can add a HAVING clause to select only the headers that appear, for example. more than doubled:

SELECT page_title FROM a
GROUP BY page_title
HAVING COUNT(page_title) > 2

      

Hope it helps

(Thanks to Aaron F for the comment)

0


a source


You can try self-learning which I have used in the past, but I am not sure if it will be faster as I am not using MySQL. This page might give you some insight: http://www.xaprb.com/blog/2006/10/11/how-to-delete-duplicate-rows-with-sql/



0


a source


Minor improvement on Rax's answer:

SELECT DISTINCT a.page_title FROM a
WHERE a.page_title NOT IN (SELECT DISTINCT b.page_title FROM b)

      

Do your tables have an index on the page_title column? What does the explain plan say for your requests?

I can't imagine you need an index anyway, given only 70k rows in a table.

0


a source


You can do it with a connection:

SELECT DISTINCT table1.page_title 
FROM table1
LEFT JOIN table2 
    ON table1.page_title = table2.page_title
WHERE table2.page_title is null

      

If it's slow, add an index to (table2.page_title)

0


a source







All Articles