hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "FAQ" by Arun C Murthy
Date Mon, 24 Sep 2007 18:32:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by Arun C Murthy:

The comment on the change is:
Added a section on how to write maps which process complete input files

  The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since
output of the map, in that case, goes directly to hdfs.
+ [[BR]]
+ [[Anchor(10)]]
+ '''10. [#10 How do I get each of my maps to work on one complete input-file and not allow
the framework to split-up my files?]'''
+ Essentially a job's input is represented by the [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/InputFormat.html
FileInputFormat](base class).
+ For this purpose one would need a 'non-splittable' [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html
FileInputFormat] i.e. an input-format which essentially tells the map-reduce framework that
it cannot be split-up and processed. To do this you need your particular input-format to return
'''false''' for the [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)
isSplittable] call.
+ E.g. '''org.apache.hadoop.mapred.Sort``Validator.Record``Stats``Checker.Non``Splitable``Sequence``File``Input``Format'''
in [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/org/apache/hadoop/mapred/SortValidator.java

View raw message