hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yonghu <yongyong...@gmail.com>
Subject Strange behavior when using MapReduce to process HBase.
Date Tue, 25 Nov 2014 10:11:15 GMT

I write a copyTable mapreduce program. My hbase version is 0.94.16. Several
rows in the source table contain multiple data versions. The Map function
looks like follows:

public void map(ImmutableBytesWritable rowKey, Result res, Context context)
throws IOException, InterruptedException{
for(KeyValue kv : res.list()){
Put put = new Put(rowKey.get());
put.add(kv.getFamily(), kv.getQualifier(), kv.getTimestamp(),
context.write(null, put);

First, I did not set the timestamp, just using put.add(kv.getFamily(),
kv.getQualifier(), kv.getValue()). However, this approach will only add the
latest data version which means the older version are overwritten, even
each of them is issued with a separate Put command.

After I add the timestamp to each data version (cell) (the code I show
above), I can get multiple data versions.

The only explanation I can think why this happens is that HBase creates the
same timestamp for all the data versions so older values are overwritten.
But what I cannot understand is each cell is issued with an individual Put
command. Comparing to the situation when clients explicitly issue Put,
HBase will generate a distinct timestamp to each cell. This behavior seems
not be supported when utilizing MapReduce.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message