hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: bulk deletes
Date Thu, 11 Oct 2012 04:04:27 GMT
You are right Jerry..
In your use case you want to delete full rows or some cfs/columns only?  Pls feel free to
see the issue HBASE-6942 and give your valuable comments..
Here I am trying to delete the rows [This is our use case]

From: Jerry Lam [chilinglam@gmail.com]
Sent: Wednesday, October 10, 2012 8:37 PM
To: user@hbase.apache.org
Subject: Re: bulk deletes

Hi guys:

The bulk delete approaches described in this thread are helpful in my case
as well. If I understood correctly, Paul's approach is useful for offline
bulk deletes (a.k.a. mapreduce) whereas Anoop's approach is useful for
online/real-time bulk deletes (a.k.a. co-processor)?

Best Regards,


On Mon, Oct 8, 2012 at 7:45 AM, Paul Mackles <pmackles@adobe.com> wrote:

> Very cool Anoop. I can definitely see how that would be useful.
> Lars - the bulk deletes do appear to work. I just wasn't sure if there was
> something I might be missing since I haven't seen this documented
> elsewhere.
> Coprocessors do seem a better fit for this in the long term.
> Thanks everyone.
> On 10/7/12 11:55 PM, "Anoop Sam John" <anoopsj@huawei.com> wrote:
> >We also done an implementation using compaction time deletes(avoid KVs).
> >This works very well for us....
> >As this would delay the deletes to happen till the next major compaction,
> >we are having an implementation to do the real time bulk delete. [We have
> >such use case]
> >Here I am using an endpoint implementation to do the scan and delete at
> >the server side only. Just raised an IA for this [HBASE-6942].  I will
> >post a patch based on 0.94 model there...Pls have a look....  I have
> >noticed big performance improvement over the normal way of  scan() +
> >delete(List<Delete>) as this avoids several network calls and traffic...
> >
> >-Anoop-
> >________________________________________
> >From: lars hofhansl [lhofhansl@yahoo.com]
> >Sent: Saturday, October 06, 2012 1:09 AM
> >To: user@hbase.apache.org
> >Subject: Re: bulk deletes
> >
> >Does it work? :)
> >
> >How did you do the deletes before?I assume you used the
> >HTable.delete(List<Delete>) API?
> >
> >(Doesn't really help you, but) In 0.92+ you could hook up a coprocessor
> >into the compactions and simply filter out any KVs you want to have
> >removed.
> >
> >
> >-- Lars
> >
> >
> >
> >________________________________
> > From: Paul Mackles <pmackles@adobe.com>
> >To: "user@hbase.apache.org" <user@hbase.apache.org>
> >Sent: Friday, October 5, 2012 11:17 AM
> >Subject: bulk deletes
> >
> >We need to do deletes pretty regularly and sometimes we could have
> >hundreds of millions of cells to delete. TTLs won't work for us because
> >we have a fair amount of bizlogic around the deletes.
> >
> >Given their current implemention  (we are on 0.90.4), this delete process
> >can take a really long time (half a day or more with 100 or so concurrent
> >threads). From everything I can tell, the performance issues come down to
> >each delete being an individual RPC call (even when using the batch API).
> >In other words, I don't see any thrashing on hbase while this process is
> >running ­ just lots of waiting for the RPC calls to return.
> >
> >The alternative we came up with is to use the standard bulk load
> >facilities to handle the deletes. The code turned out to be surpisingly
> >simple and appears to work in the small-scale tests we have tried so far.
> >Is anyone else doing deletes in  this fashion? Are there drawbacks that I
> >might be missing? Here is a link to the code:
> >
> >https://gist.github.com/3841437
> >
> >Pretty simple, eh? I haven't seen much mention of this technique which is
> >why I am a tad paranoid about it.
> >
> >Thanks,
> >Paul
View raw message