hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Phelps <...@opendns.com>
Subject Re: Duplicated entries with map job reading from HBase
Date Fri, 05 Nov 2010 23:01:06 GMT
Yeah, it wasn't the combiner.  The repeated entries are actually seen by 
the mapper, so before the combiner comes into play.  Is there some other 
info that would be useful in getting clues as to what is causing this?

- Adam

On 11/5/10 11:35 AM, Adam Phelps wrote:
> No, the system actually is much larger than two nodes. But the number of
> mappers used here tends to be fairly small (I suspect based on the HBase
> regions being accessed but usually more than two), I'll try turning off
> the combiner to see if that changes anything.
> Thanks
> - Adam
> On 11/5/10 9:23 AM, Niels Basjes wrote:
>> Hi,
>> I don't know the answer (simply not enough information in your email)
>> but I'm willing to make a guess:
>> You are running on a system with two processing nodes?
>> If so then try removing the Combiner. The combiner is a performance
>> optimization and the whole processing should work without it.
>> Some times there is a design fault in the processing and the combiner
>> disrupts the processing.
>> HTH
>> Niels Basjes
>> 2010/11/5 Adam Phelps <amp@opendns.com <mailto:amp@opendns.com>>
>> I've noticed an odd behavior with a map-reduce job I've written
>> which is reading data out of an HBase table. After a couple days of
>> poking at this I haven't been able to figure out the cause of the
>> problem, so I figured I'd ask on here.
>> (For reference I'm running with the cdh3b2 release)
>> The problem is that it seems that every line from the HBase table is
>> passed to the mappers twice, thus resulting in counts ending up as
>> exactly double what they should be.
>> I set up the job like this:
>> Scan scan = new Scan();
>> scan.addFamily(Bytes.toBytes(scanFamily));
>> TableMapReduceUtil.initTableMapperJob(table,
>> scan,
>> mapper,
>> Text.class,
>> LongWritable.class,
>> job);
>> job.setCombinerClass(LongSumReducer.class);
>> job.setReducerClass(reducer);
>> I've set up counters in the mapper to verify what is happening, so
>> that I know for certain that the mapper is being called twice with
>> the same bit of data. I've also confirmed (using the hbase shell)
>> that each entry appears only once in the table.
>> Is there a known bug along these lines? If not, does anyone have
>> any thoughts on what might be causing this or where I'd start
>> looking to diagnose?
>> Thanks
>> - Adam
>> --
>> Met vriendelijke groeten,
>> Niels Basjes

View raw message