hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haijia Zhou <leons...@gmail.com>
Subject hbase delete operation is very slow
Date Tue, 21 Feb 2012 20:52:48 GMT
Hi, All
I'm new to this email list and hope I can get help from here.
My task is to come up with a M/R job in hbase to scan the whole table, find
out some data and delete them (delete the whole row), this job will be
executed on a daily basis.
Basically I have mapper class whose map() looks like follows:
public void map(ImmutableBytesWritable row, Result columns,
                Context context)
{
  ... do some check
  byte[] row = ...
  if(needs to delete user){
       Delete delete = new Delete(row);
       table.delete(delete)
   }

There's no reducer needed for this task.

Now, we are observing that this job takes a long time to finish (around 3-4
hours) for 49,565,000 delete operations and 191,838,114 total records
across 7 region servers
We know that a full table scan on the corresponding column/column family
takes around 40 minutes, so all the rest time were for the delete operation.

I wonder if there's anyway or tool to profile the hadoop M/R job ?

Thanks

Haijia

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message