accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Accumulo 1.7 InputFormat Iterator Question
Date Thu, 18 Aug 2016 15:35:14 GMT
You could try following the same pattern as the AccumuloInputFormat: 
Create your own JamieInputFormatWithIterator which has static methods 
which will make all of the AIF.addIterator(...) calls you need, 
delegating the interface methods to AIF. This could also just be utility 
methods and you would leave your AIF calls as-is.

IMO, this is often just done in your org.apache.hadoop.util.Tool 
implementation before submitting the job to run.

Jamie Johnson wrote:
> I had been handling this in the input format where I don't have access
> to the job.  Should this be handled in a tool instead?
> I have thought about doing it in the input splits in initialize but it
> requires a cast to range input split so it seemed like there might be a
> better way.
> On Aug 17, 2016 5:31 PM, "Russ Weeks" <
> <>> wrote:
>     Hi, Jamie,
>     Try the static method AccumuloInputFormat.addIterator(job, new
>     IteratorSetting(...)).
>     Note that the method isn't idempotent. To clear the iterators on a
>     job you can
>     call job.getConfiguration.unset("AccumuloInputFormat.ScanOpts.Iterators")
>     (but that isn't officially part of the public API)
>     -Russ
>     On Wed, Aug 17, 2016 at 2:26 PM Jamie Johnson <
>     <>> wrote:
>         I am upgrading from Accumulo 1.6 to 1.7 and I am trying to
>         understand how iterators are supposed to be set in 1.7 for an
>         input format.  In my situation, if a particular property is set
>         an additional iterator needs to be added to do some additional
>         checking.  Previously I had done this in the
>         AbstractRecordReader.setupIterators() method but this has been
>         deprecated.  I had attempted to put them in
>         AbstractRecordReader.contextIterators(), but this isn't always
>         called.  This change has made me question if I was ever doing
>         this according to best practices and now wonder what the correct
>         way to do this is.  Any pointers would be greatly appreciated.

View raw message