hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Lee <d...@amplience.com>
Subject Slice MapWritable on Map
Date Tue, 11 Jun 2013 16:27:52 GMT

I am working on a hadoop based solr indexing system. The reason we are using hadoop is because
we need to prepare the data (compute values and add them to the solr documents).

For a full index I am reading in the records and outputting a MapWritable with all the fields
I want to index. I then have other Hadoop jobs which use this output as an input. They contribute
new computed fields to each document at reduce time.

This feels wrong as I am making each map read in the full document when they may only need
one or two fields from the Map to add their computed field.

Is it possible in Hadoop to request a slice of the MapWritable? Or perhaps a better way to
structure this? Would I even want to?

Thanks for any help,

View raw message