hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: data loss due to regionserver going down
Date Wed, 27 Jul 2011 16:29:09 GMT
This I can not explain.  Check blocks directory on the two servers.
Maybe they were all under one datanode only.
St.Ack


2011/7/27 吴限 <infinity0222@gmail.com>:
> Thx for your reply. But actually later I did another experiment similar to
> one which I explained earlier.
> Step 1: I inserted some data into the hbase.
>  Step 2: I shut one of the region servers.
> Step 3 : I checked the table and found some data had been lost.
> Step 4: I disabled the table and then enabled the table
> Step 5 : I checked again and found nothing lost.
>
> If some data didn't exist in the other region server, then how can u explain
> this?
>
> Hope to get ur reply.Thx~
>
> 2011/7/28 Chris Tarnas <cft@email.com>
>
>> Replication of 1x means no replication. 2x would mean the data exists in
>> two locations (what it looks like you want). Running with a replication of
>> 1x is a very bad idea and is pretty much a guaranteed way to get data loss.
>>
>> -chris
>>
>> On Jul 27, 2011, at 8:58 AM, 吴限 wrote:
>>
>> > Hi everyone. I'd like to run the following *data* *loss* scenario by you
>> to
>> > see if
>> > we are doing something obviously wrong with our setup here.
>> >
>> > Setup:
>> >   -cdh3u0
>> >   - Hadoop 0.20.2
>> >   - HBase 0.90.1
>> >   - 1 Master Node running as NameNode & JobTracker
>> >   -zookeeper quorum
>> >   - 2 child nodes running as Datanode, TaskTracker and RegionServer each
>> >   - dfs.replication is set to 1
>> >
>> > First, I inserted some data into the hbase a few hours ago.
>> > Then after a while. I rebooted one of the region servers and waited until
>> > the master responded to that. However, after I checked the table using
>> hbase
>> > shell (I used the "count" command), I noticed that there was a huge
>> amount
>> > of data being lost.
>> > After I restarted the regionserver which I had rebooted and checked
>> again,
>> > I found that some of the missing data was got back but there still
>> existed
>> > some data which hadn't been found yet.
>> > At last,after I disabled the table and then enabled the table , I found
>> that
>> > all data was stored in the cluster and there was no data that was lost.
>> >
>> > This is problematic since we are supposed to
>> > replicate at x1, so at least one other node should be able to
>> > theoretically serve the *data* that the downed regionserver can't.
>> >
>> > Questions:
>> >
>> >   - How can you guys explain this weird situation?
>> >   - Are there way to recover such lost *data*?
>> >
>> > Any tips here are definitely appreciated. I'll be happy to provide more
>> > information as well.-0
>>
>>
>

Mime
View raw message