hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Doddington <a...@doddington.net>
Subject Re: Mappers and Reducer not being called, but no errors indicated
Date Thu, 10 Nov 2011 13:58:22 GMT
Unfortunately my employer blocks any attempt to transfer data outside of the company - I realise
this makes me look pretty
foolish/uncooperative, but I hope you understand there’s little I can do about it :-(

On a more positive note, I've found a few issues which have moved me forward a bit:

I first noticed that the PiEstimator used files named part<n> to transfer data to each
of the Mappers - I had changed this name to be something more meaningful to my app. I am aware
that Hadoop uses some files that are similarly named, and hoped that this might be the cause.
Sadly, this fix made no difference.
While looking at this area of the code, I realised that although I was writing data to these
files, I was failing to close them! This fix did make a difference, in that the mappers now
actually appear to be getting called. However, the final result from the reduce was still
incorrect. What seemed to be happening (based on the mapper logs) was that the reducers was
getting called once for each mapper - which is not exactly optimal in my case.
I therefore removed the jobConf call which I had made to set my reducer to also be the combiner
- and suddenly the results started looking a lot healthier - although they are still not 100%
correct. I had naively assumed that the minimum of a set of minimums of a series of subsets
of the data would be the same as the minimum of the entire set, but I’ve clearly misunderstood
how combiners work. Will investigate the doc’n on this a bit more. Maybe some subtle interaction
wrt combiners and partitioners?

I’m still confused as to how the mappers get passed the data that I put into the part<n>
files, but I *think* I’m now heading in the right direction. If you can see the cause of
my problems (despite lack of log output)  then I’d be more than happy to hear from you :-)


	Andy D
On 10 Nov 2011, at 11:52, Harsh J wrote:

> Hey Andy,
> Can you pastebin the whole runlog of your job after you invoke it via 'hadoop jar'/etc.?
> On 10-Nov-2011, at 4:25 PM, Andy Doddington wrote:
>> Hi,
>> I have written a fairly straightforward Hadoop program, modelled after the PiEstimator
example which is shipped with the distro.
>> 1) I write a series of files to HDFS, each containing the input for a single map
task. This amounts to around 20Mb per task.
>> 2) Each of my map tasks reads the input and generates a pair of floating point values.
>> 3) My reduce task scans the list of floating point values produced by the maps and
returns the minimum.
>> Unfortunately, this is not working, but is exhibiting the following symptoms:
>> Based on log output, I have no evidence that the mappers are actually being called,
although the 'percentage complete’ output seems to go down slowly as might be expected if
they were being called.
>> I only ever get a single part-00000 file created, regardless of how many maps I specify.
>> In the case of my reducer, although its constructor, ‘setConf' and ‘close' methods
are called (based on log output), its reduce method never gets called.
>> I have checked the visibility of all classes and confirmed that the methods signatures
are correct (as confirmed by Eclipse and use of the @Override annotation), and I’m at my
wits end. To further add to my suffering, the log outputs do not show any errors :-(
>> I am using the Cloudera CDH3u1 distribution.
>> As a final query, could somebody explain how it is that the multiple files I create
get associated with the various map tasks? This part is a mystery to me (and might even be
the underlying source of my problems).
>> Thanks in anticipation,
>> 	Andy Doddington

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message