accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <>
Subject Re: Reset column iterator while using AccumuloRowInputFormat
Date Wed, 27 Feb 2013 07:52:23 GMT
You could use the leverage new TransformingIterator to seek and
iterate over the keys n times:

r1 cf:cq v
r2 cf:cq v
r3 cf:cq v


pass1-r1 cf:cq v
pass1-r2 cf:cq v
pass1-r3 cf:cq v
pass2-r1 cf:cq v
pass2-r2 cf:cq v
pass2-r3 cf:cq v

However, are you sure you need to iterate over the whole row twice?
There are strategies to internally intersect a row with itself (see
ItersectingIterator) that avoids this (at least, avoids it from the
user's perspective).

If you don't need the range in the same mapper, you could specify the
range twice in the AccumuloInputFormat's configuration, (disable
auto-adjust ranges feature so they won't be collapsed to one), and
you'll get 1 mapper per range (though I'm pretty sure this gets you
nothing more than simply doing two actions in the same mapper before
moving on to the next key/value pair).

Christopher L Tubbs II

On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <> wrote:
> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat?  We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
> I think we can work around this by having a separate scanner used within the
> map method for this purpose.  Other than that, is there a way to clone or
> copy or reset the column iterator such that we can iterate over it more than
> once?
> Thanks,
> Mike
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv =;
>     }
>     // reset column iterator back to the beginning
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv =;
>     }
> }

View raw message