accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <>
Subject Re: Reset column iterator while using AccumuloRowInputFormat
Date Sat, 02 Mar 2013 21:05:10 GMT
On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <> wrote:

> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat?  We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
> I think we can work around this by having a separate scanner used within
> the map method for this purpose.  Other than that, is there a way to clone
> or copy or reset the column iterator such that we can iterate over it more
> than once?

Currently, no.  It's not immediately obvious how we could change the
InputFormat to accomplish this.  The RecordReader creates a scanner, does
the seeking/fetching for the InputSplit once in its initialize method, then
iterates over the scanner, grouping together rows as appropriate.  Going
back to the beginning of a row would require us to seek the scanner again,
and replace the old iterator with a new one.  We could make a special
RecordReader with a reset method, but I don't know how we could call the
method.  Interactions with the RecordReader are handled by the MapContext,
and I don't know if you can use a custom MapContext.  Maybe we could have
an InputFormat that gives you a Scanner directly that you could reseek in
the Mapper, but we'd have to spend some time thinking about it to make sure
it would work.


> Thanks,
> Mike
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv =;
>     }
> *    // reset column iterator back to the beginning*
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv =;
>     }
> }

View raw message