hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Add Columnsize Filter for Scan Operation
Date Thu, 24 Oct 2013 23:09:28 GMT
Using HBase client API (scanners) for M/R is so oldish :). HFile has well defined format and
it is much more efficient to read them directly.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

From: Dhaval Shah [prince_mithibai@yahoo.co.in]
Sent: Thursday, October 24, 2013 9:53 AM
To: user@hbase.apache.org
Subject: Re: Add Columnsize Filter for Scan Operation

Jean, if we don't add setBatch to the scan, MR job does cause HBase to crash due to OOME.
We have run into this in the past as well. Basically the problem is - Say I have a region
server with 12GB of RAM and a row of size 20GB (an extreme example, in practice, HBase runs
out of memory way before 20GB). If I query the entire row, HBase does not have enough memory
to hold/process it for the response.

In practice, if your setCaching > 1, then the aggregate of all rows growing too big can
also cause the same issue.

I think 1 way we can solve this issue is making the HBase server serve responses in a streaming
fashion somehow (not exactly sure about the details on how this can work but if it has to
hold the entire row in memory, its going to be bound by HBase heap size)


 From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
To: user <user@hbase.apache.org>
Sent: Thursday, 24 October 2013 12:37 PM
Subject: Re: Add Columnsize Filter for Scan Operation

If the MR crash because of the number of columns, then we have an issue
that we need to fix ;) Please open a JIRA provide details if you are facing



2013/10/24 John <johnnyenglish739@gmail.com>

> @Jean-Marc: Sure, I can do that, but thats a little bit complicated because
> the the rows has sometimes Millions of Columns and I have to handle them
> into different batches because otherwise hbase crashs. Maybe I will try it
> later, but first I want to try the API version. It works okay so far, but I
> want to improve it a little bit.
> @Ted: I try to modify it, but I have no idea how exactly do this. I've to
> count the number of columns in that filter (that works obviously with the
> count field). But there is no Method that is caleld after iterating over
> all elements, so I can not return the Drop ReturnCode in the filterKeyValue
> Method because I did'nt know when it was the last one. Any ideas?
> regards
> 2013/10/24 Ted Yu <yuzhihong@gmail.com>
> > Please take a look
> > at
> src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
> >
> >  * Simple filter that returns first N columns on row only.
> >
> > You can modify the filter to suit your needs.
> >
> > Cheers
> >
> >
> > On Thu, Oct 24, 2013 at 7:52 AM, John <johnnyenglish739@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I'm write currently a HBase Java programm which iterates over every row
> > in
> > > a table. I have to modiy some rows if the column size (the amount of
> > > columns in this row) is bigger than 25000.
> > >
> > > Here is my sourcode: http://pastebin.com/njqG6ry6
> > >
> > > Is there any way to add a Filter to the scan Operation and load only
> rows
> > > where the size is bigger than 25k?
> > >
> > > Currently I check the size at the client, but therefore I have to load
> > > every row to the client site. It would be better if the wrong rows
> > already
> > > filtered at the "server" site.
> > >
> > > thanks
> > >
> > > John
> > >
> >

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

View raw message