hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陈 建平Chen Jianping <chenjianp...@agora.io>
Subject Problem: Duplicate data is scanned from different region in HBase 1.2.0
Date Wed, 08 Mar 2017 02:00:31 GMT
Hi group,

Recently I met with a problem that there is duplicated data scanned from different region
in HBase and all this data shares the same row key and the same value.

Here is my case, I am using Cloudera CDH 5.9.0 with Hadoop 2.6.0 and HBase 1.2.0, and the
HBase client lib is also 1.2.0. There is a HBase table which is auto-split and my Mapper (in
MapReduce task) is try to scan this table to get the data. However, some duplicated records
are retrieved from Scanner from different region and region server as follows.

Is there any suggestion on this problem? Thanks in advance.

Here is my code of scanner
Scan scan = new Scan();
    scan.setBatch(200);
    scan.setCacheBlocks(false);
    scan.setMaxVersions(1);


-----------MapReduce task log---------
mapper001
2017-03-07 10:19:30,997 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir
for child: /data/2/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
2017-03-07 10:19:31,333 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id
is deprecated. Instead, use dfs.metrics.session-id
2017-03-07 10:19:32,910 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
File Output Committer Algorithm version is 1
2017-03-07 10:19:32,922 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree
: [ ]
2017-03-07 10:19:34,160 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: HBase
table split(table name: user_session, scan: , start row: 0X\xBDO@, end row: 0X\xBD)P, region
location: ip-10-2-1-21.company.co, encoded region name: )

mapper002
2017-03-07 10:19:24,001 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir
for child: /data/2/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
2017-03-07 10:19:24,618 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id
is deprecated. Instead, use dfs.metrics.session-id
2017-03-07 10:19:25,661 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
File Output Committer Algorithm version is 1
2017-03-07 10:19:25,726 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree
: [ ]
2017-03-07 10:19:26,100 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: HBase
table split(table name: user_session, scan: , start row: 0X\xBDO@, end row: 0X\xBD)P, region
location: ip-10-2-1-23.company.co, encoded region name: )

mapper003
2017-03-07 10:19:24,278 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir
for child: /data/2/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
2017-03-07 10:19:24,621 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id
is deprecated. Instead, use dfs.metrics.session-id
2017-03-07 10:19:25,553 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
File Output Committer Algorithm version is 1
2017-03-07 10:19:25,566 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree
: [ ]
2017-03-07 10:19:25,910 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: HBase
table split(table name: user_session, scan: , start row: 0X\xBDO@, end row: 0X\xBD)P, region
location: ip-10-2-1-23.company.co, encoded region name: )

mapper004
2017-03-07 10:19:23,108 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir
for child: /data/2/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
2017-03-07 10:19:23,413 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id
is deprecated. Instead, use dfs.metrics.session-id
2017-03-07 10:19:23,952 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
File Output Committer Algorithm version is 1
2017-03-07 10:19:23,963 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree
: [ ]
2017-03-07 10:19:24,320 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: HBase
table split(table name: user_session, scan: , start row: 0X\xBDO@, end row: 0X\xBD)P, region
location: ip-10-2-1-23.company.co, encoded region name: )


Thanks,
Eric

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message