hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Data loss due to region server failure
Date Fri, 03 Sep 2010 20:23:12 GMT
You should be able to kill -9 the regionserver and only lose data that
follows the last time we sync'd (Default is sync each write IIRC).  If
is this not the case for you, then something is broken.  Lets figure
it out.

St.Ack

On Fri, Sep 3, 2010 at 1:55 AM, Gagandeep Singh
<gagandeep.singh@paxcel.net> wrote:
> I think I have figured out the problem or may be not.
>
> In order to simulate RegionServer failure my fellow programmer was killing
> the Regionserver by *kill -9 pid . *But when I used *kill pid* everything
> seems to be working fine. Obviously now region server is going down
> gracefully so there is no data loss.
>
> I also checked it on hadoop 0.20.2(without append), HBase 0.20.5 version and
> found no data-loss in case of simple kill command.
> Now my next question is should it also work with *kill -9* command?
>
> FYI - I am using VMs. In my current setup I am using 3 VMs, 1 for Namenode
> and HBase Master and both 2 and 3 have Data node and region servers running
> on them.
>
> Thanks,
> Gagan
>
>
>
> On Thu, Sep 2, 2010 at 8:18 PM, Stack <stack@duboce.net> wrote:
>
>> On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
>> <gagandeep.singh@paxcel.net> wrote:
>> > Hi Daniel
>> >
>> > I have downloaded hadoop-0.20.2+320.tar.gz from this location
>> > http://archive.cloudera.com/cdh/3/
>>
>>
>> That looks right, yes.
>>
>> > And also changed the *dfs.support.append* flag to *true* in your *
>> > hdfs-site.xml* as mentioned here
>> > http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
>> >
>>
>> That sounds right too.  As Ted suggests, you put it in to all configs
>> (though I believe it enabled by default on that branch -- in the UI
>> you'd see a warning if it was NOT enabled).
>>
>> > But data loss is still happening. Am I using the right version?
>> > Is there any other settings that I need to make so that data gets flushed
>> to
>> > HDFS.
>> >
>>
>> It looks like you are doing the right thing.  Can we see master log please?
>>
>> Thanks,
>> St.Ack
>>
>>
>> > Thanks,
>> > Gagan
>> >
>> >
>> >
>> > On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> That, or use CDH3b2.
>> >>
>> >> J-D
>> >>
>> >> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
>> >> <gagandeep.singh@paxcel.net> wrote:
>> >> > Thanks Daniel
>> >> >
>> >> > It means I have to checkout the code from branch and build it on my
>> local
>> >> > machine.
>> >> >
>> >> > Gagan
>> >> >
>> >> >
>> >> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org
>> >> >wrote:
>> >> >
>> >> >> Then I would expect some form of dataloss yes, because stock hadoop
>> >> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether
the
>> >> >> data made it to the datanodes when appending to the WAL. Please
use
>> >> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
>> >> >> <gagandeep.singh@paxcel.net> wrote:
>> >> >> > HBase - 0.20.5
>> >> >> > Hadoop - 0.20.2
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Gagan
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org
>> >> >> >wrote:
>> >> >> >
>> >> >> >> Hadoop and HBase version?
>> >> >> >>
>> >> >> >> J-D
>> >> >> >>
>> >> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
>> >> gagandeep.singh@paxcel.net>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> Hi Group,
>> >> >> >>
>> >> >> >> I am checking HBase/HDFS fail over. I am inserting 1M
records from
>> my
>> >> >> HBase
>> >> >> >> client application. I am clubbing my Put operation such
that 10
>> >> records
>> >> >> get
>> >> >> >> added into the List<Put> and then I call the table.put().
I have
>> not
>> >> >> >> modified the default setting of Put operation which means
all data
>> is
>> >> >> >> written in WAL and in case of server failure my data should
not be
>> >> lost.
>> >> >> >>
>> >> >> >> But I noticed somewhat strange behavior, while adding
records if I
>> >> kill
>> >> >> my
>> >> >> >> Region Server then my application waits till the time
region data
>> is
>> >> >> moved
>> >> >> >> to another region. But I noticed while doing so all my
data is
>> lost
>> >> and
>> >> >> my
>> >> >> >> table is emptied.
>> >> >> >>
>> >> >> >> Could you help me understand the behavior. Is there some
kind of
>> >> Cache
>> >> >> also
>> >> >> >> involved while writing because of which my data is lost.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Gagan
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Mime
View raw message