hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Hbase data loss scenario
Date Fri, 28 Feb 2014 07:46:59 GMT
HDFS size can vary and go down (when a compaction happens, and later the old files are collected
and deleted).
So the size in HDFS is not a good measure, unless you lost rows there's no reason to worry.

Can you quantify "consistent data loss"? Did you count rows before and after? Can access any
data at all?

-- Lars



________________________________
 From: kiran <kiran.sarvabhotla@gmail.com>
To: user@hbase.apache.org 
Sent: Thursday, February 27, 2014 7:53 AM
Subject: Hbase data loss scenario
 

Hi All,

We have been experiencing severe data loss issues from few hours. There are
some wierd things going on in the cluster. We were unable to locate the
data even in hdfs

Hbase version 0.94.1

Here is the wierd things that are going on:

1) Table which was once 1TB has now become 170GB with many of the regions
which we once 7gb are now becoming few MB's. We are no clue  what is
happening at all

2) Table is splitting (or what ever) (100 regions have become 200 regions)
and ours is constantregionsplitpolicy with region size 20gb. I don't know
why it is even spltting

3) HDFS namenode dump size which we periodically backup is decreasing

4) And there is a region chain with start keys and end keys as, I can't
copy paste the exact thing. For example

K1.xxx K2.xyz
K2.xyz K3.xyz,138798010000.xyp
K3.xyz,138798010000.xyp K4.xyq

I have never seen a wierd start key and end key like this. We also suspect
a failed split of a region around 20GB. We looked at logs many times but
unable to get any sense out of it. Please help us out and we can't afford
data loss.

Yesterday, There was an cluster crash of root region but we thought we
sucessfully restored that.But things did n't go that way.... There was a
consitent data loss after that.


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message