hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Shine <Dave.Sh...@channelintelligence.com>
Subject FW: WritableComparable value changing between Map and Reduce
Date Tue, 26 Jun 2012 14:20:05 GMT
After about a week of researching, logging, etc. I have finally discovered what is happening,
but I have no idea why.

I have created my own WritableComparable object so I can emit it as the key from my Mapper.
 The object contains several Longs, one String, and one Date property.  The following code
snippets are from the object

                private Date SummaryDate;

                * @return the summaryDate
                public Date getSummaryDate() {
                                return SummaryDate;

                * @param summaryDate the summaryDate to set
                public void setSummaryDate(Date summaryDate) {
                                Calendar cal = Calendar.getInstance();
                                cal.set(Calendar.HOUR, 0);
                                cal.set(Calendar.MINUTE, 0);
                                cal.set(Calendar.SECOND, 0);
                                cal.set(Calendar.MILLISECOND, 0);
                                cal.set(Calendar.AM_PM, Calendar.AM);
                                SummaryDate = cal.getTime();

                public void readFields(DataInput arg0) throws IOException {

                public void write(DataOutput arg0) throws IOException {

The intent is for the Summary date to be always be as of midnight, thus the use of the Calendar
object in the setSummaryDate() method.

I have proven via logging that the Mapper is storing the correct value in the SummaryDate
property, but sometimes the value received by the Reducer is the previous day.  Does anyone
have any idea how this could happen?

My only theory is precision on the Long where the epoch time is actually stored, that it somehow
loses a tick and becomes 1 millisecond before midnight, then my code drops the time and the
date portion is left with a date that is one day earlier.  Has anyone else seen anything like

I ready to go change my code to just store the date as a formatted string.  But I'd really
like to know if this is a known Java or Hadoop problem. FWIW, I'm using CDH3U4.

Dave Shine
Sr. Software Engineer
321.939.5093 direct |  407.314.0122 mobile

CI Boost(tm) Clients  Outperform Online(tm)  www.ciboost.com<http://www.ciboost.com/>
facebook platform | where-to-buy | product search engines | shopping engines

The information contained in this email message is considered confidential and proprietary
to the sender and is intended solely for review and use by the named recipient. Any unauthorized
review, use or distribution is strictly prohibited. If you have received this message in error,
please advise the sender by reply email and delete the message.

View raw message