hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Doddington <a...@doddington.net>
Subject Re: Mappers and Reducer not being called, but no errors indicated
Date Mon, 14 Nov 2011 16:50:06 GMT
OK, continuing our earlier conversation...

I have a job that schedules 100 map jobs (small number just for testing), passing data view
a set of 100 sequence files. This is based on the PiEstimator example, that is shipped with
the distribution.

The data consist of a blob of serialised state, amounting to around 20MB of data. I have added
various checks, including checksums,
to reduce the risk of data corruption or misalignment.

The mapper takes the blob of data as its value input and an integer in the range 0-99 as its
key (passed as a LongWritable).

Each mapper then does some processing, based upon the deserialised contents of the blob and
the integer key value (0-99).

The reducer then selects the minimum value that was produced across all of the mappers.

Unfortunately, this process is generating an incorrect value, when compared to a simple iterative
solution.

After inspecting the results it seems that the mappers are generating correct values for even-numbered
keys, but incorrect
values for odd-numbered keys. I am logging the values of the keys, so I am confident that
these are correct. My serialisation
checks also make me confident that the ‘value’ blobs are not getting corrupted, so it’s
all something of a mystery.

Harsh J: Previously, you indicated that this might be a “...key/val data issue… ...Perhaps
bad partitioning/grouping is happening as a result of that”. I apologise for the lack of
detail, but do you think this still might be the case? If so, could you refer me to some place
that gives more detail on this type of issue?

With apologies for continuing to be a nuisance :-(

Andy D


Mime
View raw message