MySQL去重复,最近用多线程爬数据的时候有部分数据重复了,现用SQL去重
方法:用group by分组,找出并删除重复的记录
1、group by分组且单组数据记录大于1的,查出有重复数据的记录
- SELECT
- mobile,
- cust_name,
- vsms_code
- FROM
- ( SELECT vsms_code, cust_name, mobile FROM santi_send_data_3916_copy1 GROUP BY mobile, cust_name, vsms_code HAVING COUNT( * ) > 1 ) a

2、把重复记录的分组作为子查询,查出表中有重复记录的数据
- SELECT *
- FROM
-
- santi_send_data_3916_copy1
- WHERE
- ( mobile, cust_name, vsms_code ) IN (
- SELECT
- mobile,
- cust_name,
- vsms_code
- FROM
- ( SELECT vsms_code, cust_name, mobile FROM santi_send_data_3916_copy1 GROUP BY mobile, cust_name, vsms_code HAVING COUNT( * ) > 1 ) a
- )

3、排除掉ID最小的记录,把其他记录都查出来
-
- SELECT *
- FROM
-
- santi_send_data_3916_copy1
- WHERE
- ( mobile, cust_name, vsms_code ) IN (
- SELECT
- mobile,
- cust_name,
- vsms_code
- FROM
- ( SELECT vsms_code, cust_name, mobile FROM santi_send_data_3916_copy1 GROUP BY mobile, cust_name, vsms_code HAVING COUNT( * ) > 1 ) a
- )
- AND id NOT IN (
- SELECT
- id
- FROM
- ( SELECT MIN( id ) AS id FROM santi_send_data_3916_copy1 GROUP BY mobile, vsms_code, cust_name HAVING COUNT( * ) > 1 ) b
- )

4、删除
-
- delete
- FROM
-
- santi_send_data_3916_copy1
- WHERE
- ( mobile, cust_name, vsms_code ) IN (
- SELECT
- mobile,
- cust_name,
- vsms_code
- FROM
- ( SELECT vsms_code, cust_name, mobile FROM santi_send_data_3916_copy1 GROUP BY mobile, cust_name, vsms_code HAVING COUNT( * ) > 1 ) a
- )
- AND id NOT IN (
- SELECT
- id
- FROM
- ( SELECT MIN( id ) AS id FROM santi_send_data_3916_copy1 GROUP BY mobile, vsms_code, cust_name HAVING COUNT( * ) > 1 ) b
- )

已去除