Friday, 1 December 2017

Deleting huge number of records in MySQL

This is a short post about DELETE data from huge table in MySQL. Most of us experienced, deleting huge record from MySQL tables taking long time, sometimes hours to complete deleting millions of records. Also, on production servers it locks the other table operations as well. Recently, I deleted around 70 million record from a production database in less than an hour. There are multiple workarounds to do this, however I am writing about the two methods which are frequently used by me for this operation. 
  • Using intermediate table.
  • Delete data in small chunks.
Before we proceed with using any of these methods, make sure the table has required indexes on where clause and you have a copy of the table as backup.

Using intermediate table:

In this method, create a new table with similar data structure and copy only required data. Rename the original table as archive or backup table and Rename the new table as original table.








Example:

Table `tabA` has 10 million record and you need only 1 million records of last 1 month. So the model query will be as follows:

CREATE TABLE `tabA_copy` as select * from `tabA` where predicate_col > curdate() - interval 30 day;
Now the table tabA_copy will have only the required data.

Rename table tabA to tabA_archive;
Rename table tabA_copy to tabA;
This method is useful when you wanted to copy only small percentage of the huge table data. Since, moving small portion of data to new table is cheaper when compared to delete huge data.









Delete data in smaller chunks:

In this method, data will be deleted as smaller chunks. Even in case of rollback also, it is cheaper to rollback smaller data sets. I have been using the following procedure to implement this method. Actually, it is a dynamic procedure with PREPARED statement to satisfy all the requirements. For simplicity, I am writing the core part of it. 

drop procedure if exists huge_delete;
delimiter //
create procedure huge_delete()
begin
 declare rows int;
 declare rows_deleted int;
 set rows = 1;
 set rows_deleted = 10000;
 while rows > 0
 do
    delete from db.tabA where predicate_col < curdate() - interval 90 day order by `id` limit 10000;
    set rows = row_count();
    set rows_deleted = rows_deleted + row_count();
    select rows_deleted as "Rows Deleted";
 end while;
end //
delimiter ;






This method is suited when you wanted delete huge set of data (in millions) from the table. Make sure, you have index on the where clause column.

Note:

I am using innodb_flush_log_trx_commit = 2 to reduce the total response time. There are certain points to be considered before setting the innodb_flush_log_at_trx_commit as 2. Take a look at this article before you proceed with this value.


This setting runs with autocommit = 1 value which is default. In case if you run with transactions, make sure you commit the transaction at every iteration.

Caution:

If you are running on Replication master server make sure, slave is in synchronization with master. In my recent implementation, the slave is very long time behind the master. I fixed it by skipping some transactions and later used compare utility to make it sync.

I recommend you to test this steps and confirm that, it is working as you expected before using it on production server. I hope this post will help you to get the things done!.






Post a Comment