hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bates <christopher.andrew.ba...@gmail.com>
Subject Re: help with filters
Date Wed, 27 Jan 2010 01:24:03 GMT
So are you saying that MUST_PASS_ALL might be flawed (for a FilterList of
two QualifierFilters)?  If so, I can dig into the source and see if I can
find anything.

Or are you saying that my data profile is wrong? If so, can you (or someone
else) suggest one that works?

I tried this:
hbase(main):032:0> scan 'testTable3'
ROW                          COLUMN+CELL

 row1                        column=col1:qualifier-1,
timestamp=1264554774915, value=some_col1_qual1_value
 row1                        column=col1:qualifier-2,
timestamp=1264554866041, value=some_col1_qual2_value

And that doesn't work with the two QualifierFilters.


I haven't actually run the JUnit tests because I haven't dealt with JUnit
before, but if you suggest I run those I can do that as well. I was hoping
someone could submit a working implementation of MPALL with 2
QualfierFilters. I thought that might have been a pretty common operation.


On Tue, Jan 26, 2010 at 8:01 PM, Stack <stack@duboce.net> wrote:

> On Tue, Jan 26, 2010 at 4:51 PM, Chris Bates
> <christopher.andrew.bates@gmail.com> wrote:
> >
>
> Must pass all "works" because there's a unit test that asserts so?
> I'm not sure what it is about your data profile that is messing with
> this functionality.  Its something involved where my guess is the only
> way to figure it is to set up some kinda harness and step through the
> debugger.  Any chance of your having a go at that Chris?
>
> Thanks,
> St.Ack
>
>
> > Second, I'm still not able to get the AND operation working.
> >
> > To illustrate:
> >
> > hbase(main):010:0> scan 'testTable', {COLUMNS=>["user:theme",
> > "user:REMOTE_ADDR"]}
> > ROW                          COLUMN+CELL
> >
> >  row1                        column=user:REMOTE_ADDR,
> > timestamp=1264464021672, value=172.16.1.3
> >  row1                        column=user:theme, timestamp=1264464041857,
> > value=Frost
> >  row2                        column=user:theme, timestamp=1264464058064,
> > value=Sunshine
> >  row3                        column=user:REMOTE_ADDR,
> > timestamp=1264464083332, value=172.16.0.06
> >
> > With MUST_PASS_ALL enabled...
> >
> > If I comment out the REMOTE_ADDR filter, I get:
> > IP: null Theme: Frost
> > IP: null Theme: Sunshine
> >
> > If I comment out the theme filter, I get the reverse.
> > IP: 172.16.1.3 Theme: null
> > IP: 172.16.0.06 Theme: null
> >
> > If I leave both in, I get __nothing__, when I want:
> > IP: 172.16.1.3 Theme: Frost
> >
> > I thought this might be due to HBase not being able to do an AND
> operation
> > on Qualifiers of the same column, so I created another testTable2 with
> two
> > different columns:
> >
> > hbase(main):024:0> scan 'testTable2'
> > ROW                          COLUMN+CELL
> >
> >  row1                        column=addr:REMOTE_ADDR,
> > timestamp=1264552425218, value=172.16.1.3
> >  row1                        column=user:theme, timestamp=1264552375737,
> > value=Frost
> >  row2                        column=user:theme, timestamp=1264552505491,
> > value=Sunshine
> >  row3                        column=addr:REMOTE_ADDR,
> > timestamp=1264552538651, value=172.16.0.36
> >
> > But nothing changed.
> >
> >
> > Any other thoughts?  The only solution I can see to get this done is to
> > implement a row counter for each column+qualifier and then store the
> results
> > that meet criteria that I expect, but I was hoping a native filter would
> do
> > the job.
> >
> >
> > On Mon, Jan 25, 2010 at 8:43 PM, Stack <stack@duboce.net> wrote:
> >
> >> See the TestFilterList under unit tests, src/test.  Can you mess
> >> around with it using your data and see if it tells you anything?
> >> There's a testMPALL in there.   Might give you a clue (Your code looks
> >> fine)
> >>
> >> St.Ack
> >>
> >> On Mon, Jan 25, 2010 at 4:25 PM, Chris Bates
> >> <christopher.andrew.bates@gmail.com> wrote:
> >> > thanks stack. i upgraded to the RC3 0.20.3.
> >> >
> >> > I was still getting the hanging, so I decided to create a real simple
> >> table
> >> > to try to see if I can get the logic working:
> >> >
> >> > hbase(main):031:0> scan 'testTable'
> >> > ROW                          COLUMN+CELL
> >> >
> >> >  row1                        column=user:REMOTE_ADDR,
> >> > timestamp=1264464021672, value=172.16.1.3
> >> >  row1                        column=user:theme,
> timestamp=1264464041857,
> >> > value=Frost
> >> >  row2                        column=user:theme,
> timestamp=1264464058064,
> >> > value=Sunshine
> >> >  row3                        column=user:REMOTE_ADDR,
> >> > timestamp=1264464083332, value=172.16.0.06
> >> >
> >> > Without the filter (http://pastebin.com/m20ba0d2d) this is my output
> >> > client-side:
> >> > IP: 172.16.1.3
> >> > Theme: Frost
> >> > IP: null
> >> > Theme: Sunshine
> >> > IP: 172.16.0.06
> >> > Theme: null
> >> >
> >> > If I uncomment the setFilter, I get nothing.  I'm expecting to get the
> >> first
> >> > two lines (row1).  Thus I don't believe my filters are setup
> correctly,
> >> but
> >> > I'm unsure where the error would be.
> >> >
> >> > Does anyone have any thoughts or examples?
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > On Mon, Jan 25, 2010 at 1:45 PM, Stack <stack@duboce.net> wrote:
> >> >
> >> >> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
> >> >>
> >> >>
> >>
> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
> >> >> .
> >> >>  I believe what your issue fixed.
> >> >> St.Ack
> >> >>
> >> >> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
> >> >> <christopher.andrew.bates@gmail.com> wrote:
> >> >> > 0.20.1
> >> >> >
> >> >> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <stack@duboce.net>
wrote:
> >> >> >
> >> >> >> What version of HBase?
> >> >> >> St.Ack
> >> >> >>
> >> >> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
> >> >> >> <christopher.andrew.bates@gmail.com> wrote:
> >> >> >> > Hi all,
> >> >> >> >
> >> >> >> > I'm trying to do an AND operation and I'm not sure if
I did the
> >> >> filtering
> >> >> >> > correctly because HBase is hanging on me.
> >> >> >> >
> >> >> >> > What I want is this:
> >> >> >> >
> >> >> >> > I have two qualifiers, theme and IP, to my column user.
 I'd
> like
> >> to
> >> >> >> print
> >> >> >> > out all matches (or maybe just 10) where the row has
both of
> them
> >> in
> >> >> it.
> >> >> >>  My
> >> >> >> > impression is that this is what HBase would excel at,
because
> the
> >> >> dataset
> >> >> >> is
> >> >> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe
just 1
> or
> >> 2
> >> >> will
> >> >> >> > have BOTH an IP and a theme in it.  Most of the time
its just
> one
> >> or
> >> >> the
> >> >> >> > other.
> >> >> >> >
> >> >> >> > So this is my code to make that query, but as I said,
its
> hanging.
> >> >> >> > http://pastebin.com/m7fcef49
> >> >> >> >
> >> >> >> > If I comment out the filters, the query runs just fine
and will
> >> print
> >> >> >> null
> >> >> >> > wherever the value is not present.
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message