accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Can WholeRowIterator be used with AccumuloInputFormat?
Date Fri, 28 Dec 2012 15:00:22 GMT
The AccumuloInputFormat can use any iterator, custom or packaged with 
Accumulo, as long as its on the TabletServer's classpath.

I'm a little confused at what you actually want as input to your 
MapReduce job. Do you want all keys where the CQ starts with XXX? Or, do 
you want the entire "record" (123_123_1234_000  RECID=13) when such a 
record exists that contains some value for the domain "XXX"?

As an aside, both cases would be rather inefficient as diagrammed as you 
have to scan the entire table and filter records in the Mapper instead 
of letting the TabletServer filter results for you. If the former case 
is what you want, you could use the RegexFilter to prune results 
server-side. If the latter is the case, you most likely have to write 
your own iterator to get the desired functionality (or permute your key 
structure so that it better falls into some built-in access paths such 
as fetchColumn).

Perhaps you could also build an index table that inverts row+colfam and 
colqual if this is a common access pattern for you.

Also, be aware that if you have many columns in a row, the 
WholeRowIterator has the potential to exceed the TabletServer's heap as 
it aggregates all of the columns for that row together.

On 12/28/12 9:01 AM, David Medinets wrote:
> I have a schema that looks something like:
>
> ROW                       CF            CQ
> 123_123_1234_000  RECID=13  XXX=BEEF
> 123_123_1234_000  RECID=13  YYY=BAR
> 999_123_1999_000  RECID=51  XXX=HAM
> 999_123_1999_000  RECID=51  FOO=BAR
>
> My goal is to find the domain values for the XXX 'field'. My
> map-reduce job succeeds at doing this using the standard iterators.
> I'm wondering if using the WholeRowIterator might be a better
> approach. Or perhaps there is another way (beyond a custom iterator)?


Mime
View raw message