hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagandeep Singh <gagandeep.si...@paxcel.net>
Subject Re: Data loss due to region server failure
Date Fri, 03 Sep 2010 08:55:53 GMT
I think I have figured out the problem or may be not.

In order to simulate RegionServer failure my fellow programmer was killing
the Regionserver by *kill -9 pid . *But when I used *kill pid* everything
seems to be working fine. Obviously now region server is going down
gracefully so there is no data loss.

I also checked it on hadoop 0.20.2(without append), HBase 0.20.5 version and
found no data-loss in case of simple kill command.
Now my next question is should it also work with *kill -9* command?

FYI - I am using VMs. In my current setup I am using 3 VMs, 1 for Namenode
and HBase Master and both 2 and 3 have Data node and region servers running
on them.

Thanks,
Gagan



On Thu, Sep 2, 2010 at 8:18 PM, Stack <stack@duboce.net> wrote:

> On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
> <gagandeep.singh@paxcel.net> wrote:
> > Hi Daniel
> >
> > I have downloaded hadoop-0.20.2+320.tar.gz from this location
> > http://archive.cloudera.com/cdh/3/
>
>
> That looks right, yes.
>
> > And also changed the *dfs.support.append* flag to *true* in your *
> > hdfs-site.xml* as mentioned here
> > http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
> >
>
> That sounds right too.  As Ted suggests, you put it in to all configs
> (though I believe it enabled by default on that branch -- in the UI
> you'd see a warning if it was NOT enabled).
>
> > But data loss is still happening. Am I using the right version?
> > Is there any other settings that I need to make so that data gets flushed
> to
> > HDFS.
> >
>
> It looks like you are doing the right thing.  Can we see master log please?
>
> Thanks,
> St.Ack
>
>
> > Thanks,
> > Gagan
> >
> >
> >
> > On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> That, or use CDH3b2.
> >>
> >> J-D
> >>
> >> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
> >> <gagandeep.singh@paxcel.net> wrote:
> >> > Thanks Daniel
> >> >
> >> > It means I have to checkout the code from branch and build it on my
> local
> >> > machine.
> >> >
> >> > Gagan
> >> >
> >> >
> >> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> Then I would expect some form of dataloss yes, because stock hadoop
> >> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
> >> >> data made it to the datanodes when appending to the WAL. Please use
> >> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
> >> >> <gagandeep.singh@paxcel.net> wrote:
> >> >> > HBase - 0.20.5
> >> >> > Hadoop - 0.20.2
> >> >> >
> >> >> > Thanks,
> >> >> > Gagan
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org
> >> >> >wrote:
> >> >> >
> >> >> >> Hadoop and HBase version?
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
> >> gagandeep.singh@paxcel.net>
> >> >> >> wrote:
> >> >> >>
> >> >> >> Hi Group,
> >> >> >>
> >> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records
from
> my
> >> >> HBase
> >> >> >> client application. I am clubbing my Put operation such that
10
> >> records
> >> >> get
> >> >> >> added into the List<Put> and then I call the table.put().
I have
> not
> >> >> >> modified the default setting of Put operation which means
all data
> is
> >> >> >> written in WAL and in case of server failure my data should
not be
> >> lost.
> >> >> >>
> >> >> >> But I noticed somewhat strange behavior, while adding records
if I
> >> kill
> >> >> my
> >> >> >> Region Server then my application waits till the time region
data
> is
> >> >> moved
> >> >> >> to another region. But I noticed while doing so all my data
is
> lost
> >> and
> >> >> my
> >> >> >> table is emptied.
> >> >> >>
> >> >> >> Could you help me understand the behavior. Is there some kind
of
> >> Cache
> >> >> also
> >> >> >> involved while writing because of which my data is lost.
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Gagan
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message