accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Kubina <>
Subject Read and writing rfiles
Date Wed, 23 Dec 2015 14:12:48 GMT
I've have a mapreduce job that reads rfiles as Accumulo key/value
pairs using FileSKVIterator within a RecordReader, partition/shuffles them
based on the byte string of the key, and writes them out as new rfiles
using the AccumuloFileOutputFormat. The objective is to create larger
rfiles for bulk ingesting and to minimize the number of tservers each rfile
is assigned to after they are bulk ingested.

For tables with a simple schema it works fine, but for tables with complex
schema the new rfiles are causing the tservers to throw a null pointer
exception during a compaction.

Is there more to an rfile than just the key/value pairs that I am missing?

If I compute an order independent checksum of the bytes of the key/value
pairs in the original rfiles and the new rfiles shouldn't they be the same?

View raw message