hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Phelps <...@opendns.com>
Subject Duplicated entries with map job reading from HBase
Date Fri, 05 Nov 2010 00:57:40 GMT
I've noticed an odd behavior with a map-reduce job I've written which is 
reading data out of an HBase table.  After a couple days of poking at 
this I haven't been able to figure out the cause of the problem, so I 
figured I'd ask on here.

(For reference I'm running with the cdh3b2 release)

The problem is that it seems that every line from the HBase table is 
passed to the mappers twice, thus resulting in counts ending up as 
exactly double what they should be.

I set up the job like this:

             Scan scan = new Scan();
             scan.addFamily(Bytes.toBytes(scanFamily));

             TableMapReduceUtil.initTableMapperJob(table,
                                                   scan,
                                                   mapper,
                                                   Text.class,
                                                   LongWritable.class,
                                                   job);
             job.setCombinerClass(LongSumReducer.class);

             job.setReducerClass(reducer);

I've set up counters in the mapper to verify what is happening, so that 
I know for certain that the mapper is being called twice with the same 
bit of data.  I've also confirmed (using the hbase shell) that each 
entry appears only once in the table.

Is there a known bug along these lines?  If not, does anyone have any 
thoughts on what might be causing this or where I'd start looking to 
diagnose?

Thanks
- Adam

Mime
View raw message