accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <aa...@cordovas.org>
Subject Re: filter on value ranges
Date Fri, 09 Mar 2012 14:18:03 GMT
The best way is to build a separate numerical index on the salary field. The Accumulo table
would look like this:

rowID	colfam	colqual		value

0040000	salary	employeeY	[blank]
0041000	salary	employeeJ	[blank]
0042000	salary	employeeV	[blank]
0043000	salary	employeeB	[blank]
0044000	salary	employeeR	[blank]
0045000	salary	employeeG	[blank]


where 'employeeY' refers to the rowID of your main table. 

A numerical index may need to deal with negative numbers, and with arbitrarily large numbers.
Depending on your needs, you'll have to transform your numbers into strings that, when sorted
lexicographically, reflect the proper numerical sort order you require. The above example
uses 0-padding, which doesn't account for negative numbers or arbitrarily large numbers (i.e.
numbers over 9,999,999 will not sort correctly).

Let's call your transform function trans()

Then you can answer your query via a single scan, starting at trans(X) and ending at trans(Y).
If employee names are used as the rowID, you're done. 

If the employee names are stored as values under a field in your main table, you extract the
column qualifiers from the keys returned, and pass them in a List to a BatchScanner that is
configured to scan your main table and retrieve the employee names, i.e. configured to retrieve
just the column family:qualifier under which employee name is stored.

This is, admittedly, a pain. But it's doable and it scales.

On Mar 9, 2012, at 9:07 AM, Kini, Ameet M. wrote:

>  
> In 1.4, is there a way to use built-in iterators to run the following query :
>   “get the name and salary of all employees where the salary is between X and Y”
>  
> Assuming a straightforward schema where name and salary are both cq.
>  
> I’d like both the cq restriction and the range predicate applied on the tservers.
>  
> I see that Scanner.setColumnQualifierRegex would take care of the cq restriction. But
I don’t know of a built-in iterator for the range predicate and I don’t know of how to
compose those two iterators.
>  
> Thanks,
> -Ameet Kini
>  


Mime
View raw message