hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ghigliotti, Matthew" <Matthew.Ghiglio...@garmin.com>
Subject MultipleInputs and Paths Containing Commas
Date Thu, 09 Dec 2010 21:42:22 GMT

I'm unsure of if this is a bug or an oversight, but since I've not found any reference anywhere
to this, I figured I might bring it to light.

I've been using MultipleInputs for several of my MapReduce jobs, where I am joining together
different forms of data. However, I have encountered the following exception with some uses
of MultipleInputs in Hadoop 0.20.2:

java.lang.ArrayIndexOutOfBoundsException: 1
 at org.apache.hadoop.mapred.lib.MultipleInputs.getInputFormatMap(MultipleInputs.java:94)
 at org.apache.hadoop.mapred.lib.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:51)
 at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)

After tracing through the source code, it appears that this occurs when the input path specified
in MultipleInputs#addInputPath() contains a comma, which most often using globs (for example,
"/months/{March,April,May}.txt"). Because the path itself contains commas, one of the two
special delimiters used in MultipleInputs#getInputFormatMap(), when the input format map is
being created, it parses the path-inputformat data incorrectly.

Could someone verify this behavior in other versions of Hadoop? And possibly the more important
question, should this actually be considered a bug in MultipleInputs?


This e-mail and any attachments may contain confidential material for the sole use of the
intended recipient. If you are not the intended recipient, please be aware that any disclosure,
copying, distribution or use of this e-mail or any attachment is prohibited. If you have received
this e-mail in error, please contact the sender and delete all copies.

Thank you for your cooperation.

View raw message