accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kini, Ameet M." <ak...@mitre.org>
Subject RE: filter on value ranges
Date Fri, 09 Mar 2012 18:11:15 GMT


Thanks for the comments.

I'm ok with rolling my own iterator/filter but not sure how to go about doing it (see next
para), so it'd be great to get pointers on it.  I'd prefer keeping the schema to how it is
today where each employee is represented by a row in the table with a properties cf containing
name and salary cq. Here's how it looks today

rowID colfam     colqual         value

abc  properties name            john
abc  properties salary          10000
def  properties name            alice
def  properties salary          20000

Part of my confusion lies in not knowing how to implement this range filter class, because
my query needs to get both the name as well as salary based on a particular salary. What I
would like to do is something like a Filter equivalent to WholeRowIterator, say WholeRowFilter
whose accept(Key k, Value v) was provided the entire row in the Value argument alongwith appropriate
encodeRow/decodeRow as in WholeRowIterator. If the accept method returns true, the whole row
is returned to the client. Then I could extend this class by writing a MyRangeFilter which
would look inside the row and make row level accept/reject decisions based on values of particular
cq.

Maybe this WholeRowFilter is already there in some form?

-Ameet Kini

From: Aaron Cordova [mailto:aaron@cordovas.org]<mailto:[mailto:aaron@cordovas.org]>
Sent: Friday, March 09, 2012 9:20 AM
To: accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>
Subject: Re: filter on value ranges

To answer your question, I would not use built-in iterators for this.

But if you were determined, you could use what is known as 'document sharding' as opposed
to 'term sharding' and use an intersecting iterator.

Instructions on how to do this should be added to the manual ...


On Mar 9, 2012, at 9:07 AM, Kini, Ameet M. wrote:



In 1.4, is there a way to use built-in iterators to run the following query :
  "get the name and salary of all employees where the salary is between X and Y"

Assuming a straightforward schema where name and salary are both cq.

I'd like both the cq restriction and the range predicate applied on the tservers.

I see that Scanner.setColumnQualifierRegex would take care of the cq restriction. But I don't
know of a built-in iterator for the range predicate and I don't know of how to compose those
two iterators.

Thanks,
-Ameet Kini



Mime
View raw message