hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John <johnnyenglish...@gmail.com>
Subject Re: Add Columnsize Filter for Scan Operation
Date Thu, 24 Oct 2013 16:24:18 GMT
@Jean-Marc: Sure, I can do that, but thats a little bit complicated because
the the rows has sometimes Millions of Columns and I have to handle them
into different batches because otherwise hbase crashs. Maybe I will try it
later, but first I want to try the API version. It works okay so far, but I
want to improve it a little bit.

@Ted: I try to modify it, but I have no idea how exactly do this. I've to
count the number of columns in that filter (that works obviously with the
count field). But there is no Method that is caleld after iterating over
all elements, so I can not return the Drop ReturnCode in the filterKeyValue
Method because I did'nt know when it was the last one. Any ideas?

regards


2013/10/24 Ted Yu <yuzhihong@gmail.com>

> Please take a look
> at src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
>
>  * Simple filter that returns first N columns on row only.
>
> You can modify the filter to suit your needs.
>
> Cheers
>
>
> On Thu, Oct 24, 2013 at 7:52 AM, John <johnnyenglish739@gmail.com> wrote:
>
> > Hi,
> >
> > I'm write currently a HBase Java programm which iterates over every row
> in
> > a table. I have to modiy some rows if the column size (the amount of
> > columns in this row) is bigger than 25000.
> >
> > Here is my sourcode: http://pastebin.com/njqG6ry6
> >
> > Is there any way to add a Filter to the scan Operation and load only rows
> > where the size is bigger than 25k?
> >
> > Currently I check the size at the client, but therefore I have to load
> > every row to the client site. It would be better if the wrong rows
> already
> > filtered at the "server" site.
> >
> > thanks
> >
> > John
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message