accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Re: Read and writing rfiles
Date Wed, 23 Dec 2015 15:26:26 GMT
Are the hadoop nodes handling your map-reduce job also running tservers?

Do the Accumulo log files show the exception? If so, can you post it?

On Wed, Dec 23, 2015 at 9:12 AM, Jeff Kubina <> wrote:

> I've have a mapreduce job that reads rfiles as Accumulo key/value
> pairs using FileSKVIterator within a RecordReader, partition/shuffles them
> based on the byte string of the key, and writes them out as new rfiles
> using the AccumuloFileOutputFormat. The objective is to create larger
> rfiles for bulk ingesting and to minimize the number of tservers each rfile
> is assigned to after they are bulk ingested.
> For tables with a simple schema it works fine, but for tables with complex
> schema the new rfiles are causing the tservers to throw a null pointer
> exception during a compaction.
> Is there more to an rfile than just the key/value pairs that I am missing?
> If I compute an order independent checksum of the bytes of the key/value
> pairs in the original rfiles and the new rfiles shouldn't they be the same?

View raw message