accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: BatchScanner behavior with AccumuloRowInputFormat
Date Wed, 30 Nov 2016 16:48:22 GMT
You'd only have to worry about this behavior if you set
RowInputFormat.setBatchScan(job, true), available since 1.7.0.
By default, our InputFormats use a regular Accumulo Scanner.

See https://issues.apache.org/jira/browse/ACCUMULO-3602 and
https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.7.0/org/apache/accumulo/core/client/mapreduce/InputFormatBase.html#setBatchScan(org.apache.hadoop.mapreduce.Job,%20boolean)


On Wed, Nov 30, 2016 at 9:42 AM Massimilian Mattetti <MASSIMIL@il.ibm.com>
wrote:

Hi all,

as you already know, the AccumuloRowInputFormat is internally using a
RowIterator for iterating over all the key value pairs of a single row. In
the past when I was using the RowIterator together with a BatchScanner I
had the problem of a single row be split into multiple rows due to the fact
that a BatchScanner can interleave key-value pairs of different rows.
Should I expect the same behavior when using the AccumuloRowInputFormat
with a BatchScanner (enabled via setBatchScan)?
Thanks,
Max

Mime
View raw message