accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: Read and writing rfiles
Date Wed, 23 Dec 2015 15:26:26 GMT
Are the hadoop nodes handling your map-reduce job also running tservers?

Do the Accumulo log files show the exception? If so, can you post it?

On Wed, Dec 23, 2015 at 9:12 AM, Jeff Kubina <jeff.kubina@gmail.com> wrote:

> I've have a mapreduce job that reads rfiles as Accumulo key/value
> pairs using FileSKVIterator within a RecordReader, partition/shuffles them
> based on the byte string of the key, and writes them out as new rfiles
> using the AccumuloFileOutputFormat. The objective is to create larger
> rfiles for bulk ingesting and to minimize the number of tservers each rfile
> is assigned to after they are bulk ingested.
>
> For tables with a simple schema it works fine, but for tables with complex
> schema the new rfiles are causing the tservers to throw a null pointer
> exception during a compaction.
>
> Is there more to an rfile than just the key/value pairs that I am missing?
>
> If I compute an order independent checksum of the bytes of the key/value
> pairs in the original rfiles and the new rfiles shouldn't they be the same?
>
>

Mime
View raw message