hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sawant, Chandramohan " <chandramohan.saw...@citi.com.INVALID>
Subject Inconsistent count between HBase shell count and custom Map reduce row counter run on snapshot of same table
Date Mon, 18 Dec 2017 22:15:19 GMT
Hi All,

The HBase shell count command run on a table is giving a different count than the custom map
reduce row counter job ran on snapshot of same table.
This is a custom row counter written by us which is similar to HBase default map reduce row
counter, the only difference is that our job first creates the snapshot and then run the map
reduce task on snapshot and not on directly table, after finishing the count on snapshot,
it deletes the snapshot programmatically.
Snapshot creation is programmatic done through the HBaseAdmin.snapshot() API.
When we analyzed the map reduce job log, we understood that it is gathering extra regions
while creating the snapshot.
>From master web UI (also hbase hbck) the number of regions allocated for the tables TABLE1
and TABLE2 are 10 and 4 respectively, however in the snapshot creation logs it is showing
18 and 7 regions respectively.
The same Map Reduce job ran on same tables in prod giving a correct count which is matching
with the HBase shell count command. Only in cob we are observing this erroneous behavior.
There is a one way replication enabled from prod to cob.
Also there is no filter condition added in scanner in map reduce job, it is plain select count(*)
kind of query.

The command used for running the custom the map reduce counter job is -

java -cp ${HADOOP_CLASSPATH}:${HADOOP_CONF_DIR}:${HBASE_CONF_DIR} com.XXX.XXX.XXX.RowCounterCustom
-libjars ${LIBJAR} TABLE1

We have observed that there are 7 .regioninfo files for table TABLE1 whereas 4 regions are
only reported by hbck command. And suspect that this might be the root cause of issue. When
it creates the snapshot it is gathering information from 7 regions instead of 4 regions and
that might be the reason it is counting extra rows.

This issue is non producible on lower envs.

We want to know is there any bug reported in HBase where regions count not matching between
number of regions reported by hbck command and number of actual .regioninfo files of table
on hdfs.

Please let us know what could be the reason of not processing of appropriate number of regions
leading to incorrect row count in COB.

--------------------------------------------------------------------------
Please see below for ref the description of TABLE1 -

hbase(main):001:0> desc 'TABLE1'

Table TABLE1 is ENABLED

TABLE1, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
coprocessor$2 => '|org.apache.

phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|

', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|',
coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|8053063

66|index.builder=org.apache.phoenix.index.PhoenixIndexBuilder,org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec',
coproces

sor$6 => '|org.apache.hadoop.hbase.regionserver.LocalIndexSplitter|805306366|'}

COLUMN FAMILIES DESCRIPTION

{NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE
=> '1', COMPRESSION => 'NONE', VERSIONS => '2', TTL => 'FOREVER', M

IN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY
=> 'false', BLOCKCACHE => 'true'}

1 row  in 0.2790 seconds
--------------------------------------------------------------------------


Regards,
CM
+1 201 763 1656


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message