hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George P. Stathis" <gstat...@traackr.com>
Subject Re: A data loss scenario with a single region server going down
Date Mon, 20 Sep 2010 20:21:05 GMT
Thanks for the response Ryan. I have no doubt that 0.89 can be used in
production and that it has strong support. I just wanted to avoid moving to
it now because we have limited resources and it would put a dent in our
roadmap if we were to fast track the migration now. Specifically, we are
using HBASE-2438 and HBASE-2426 to support pagination across indexes. So we
either have to migrate those to 0.89 or somehow go stock and be able to
support pagination across region servers.

Of course, if the choice is between migrating or losing more data, data
safety comes first. But if we can buy two or three more months of time and
avoid region server crashes (like you did for a year), maybe we can go that
route for now. What do we need to do achieve that?

-GS

PS: Out of curiosity, I understand the WAL log append issue for a single
regionserver when it comes to losing the data on a single node. But if that
data is also being replicated on another region server, why wouldn't it be
available there? Or is the WAL log shared across multiple region servers
(maybe that's what I'm missing)?


On Mon, Sep 20, 2010 at 3:52 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Hey,
>
> The problem is that the stock 0.20 hadoop wont let you read from a
> non-closed file.  It will report that length as 0.  So if a
> regionserver crashes, that last WAL log that is still open becomes 0
> length and the data within in unreadable.  That specifically is the
> problem of data loss.  You could always make it so your regionservers
> rarely crash - this is possible btw and I did it for over a year.
>
> But you will want to run CDH3 or the append-branch releases to get the
> series of patches that fix this hole.  It also happens that only 0.89
> runs on it.  I would like to avoid the hadoop "everyone uses 0.20
> forever" problem and talk about what we could do to help you get on
> 0.89.  Over here at SU we've made a commitment to the future of 0.89
> and are running it in production.  Let us know what else you'd need.
>
> -ryan
>
> On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
> <gstathis@traackr.com> wrote:
> > Thanks Todd. We are not quite ready to move to 0.89 yet. We have made
> custom
> > modifications to the transactional contrib sources which are now taken
> out
> > of 0.89. We are planning on moving to 0.90 when it comes out and at that
> > point, either migrate our customizations, or move back to the out-of-the
> box
> > features (which will require a re-write of our code).
> >
> > We are well aware of the CDH distros but at the time we started with
> hbase,
> > there was none that included HBase. I think CDH3 the first one to include
> > HBase, correct? And is 0.89 the only one supported?
> >
> > Moreover, are we saying that there is no way to prevent stock hbase
> 0.20.6
> > and hadoop 0.20.2 from losing data when a single node goes down? It does
> not
> > matter if the data is replicated, it will still get lost?
> >
> > -GS
> >
> > On Sun, Sep 19, 2010 at 5:58 PM, Todd Lipcon <todd@cloudera.com> wrote:
> >
> >> Hi George,
> >>
> >> The data loss problems you mentioned below are known issues when running
> on
> >> stock Apache 0.20.x hadoop.
> >>
> >> You should consider upgrading to CDH3b2, which includes a number of HDFS
> >> patches that allow HBase to durably store data. You'll also have to
> upgrade
> >> to HBase 0.89 - we ship a version as part of CDH that will work well.
> >>
> >> Thanks
> >> -Todd
> >>
> >> On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
> gstathis@traackr.com
> >> >wrote:
> >>
> >> > Hi folks. I'd like to run the following data loss scenario by you to
> see
> >> if
> >> > we are doing something obviously wrong with our setup here.
> >> >
> >> > Setup:
> >> >
> >> >   - Hadoop 0.20.1
> >> >   - HBase 0.20.3
> >> >   - 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
> >> >   HMaster and 1 Zookeeper (no zookeeper quorum right now)
> >> >   - 4 child nodes running a Datanode, TaskTracker and RegionServer
> each
> >> >   - dfs.replication is set to 2
> >> >   - Host: Amazon EC2
> >> >
> >> > Up until yesterday, we were frequently experiencing
> >> > HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
> >> > which kept bringing our RegionServers down. What we realized though is
> >> that
> >> > we were losing data (a few hours worth) with just one out of four
> >> > regionservers going down. This is problematic since we are supposed to
> >> > replicate at x2 out of 4 nodes, so at least one other node should be
> able
> >> > to
> >> > theoretically serve the data that the downed regionserver can't.
> >> >
> >> > Questions:
> >> >
> >> >   - When a regionserver goes down unexpectedly, the only data that
> >> >   theoretically gets lost was whatever didn't make it to the WAL,
> right?
> >> Or
> >> >   wrong? E.g.
> >> >
> >> >
> >>
> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
> >> >   - We ran a hadoop fsck on our cluster and verified the replication
> >> factor
> >> >   as well as that the were no under replicated blocks. So why was our
> >> data
> >> > not
> >> >   available from another node?
> >> >   - If the log gets rolled every 60 minutes by default (we haven't
> >> touched
> >> >   the defaults), how can we lose data from up to 24 hours ago?
> >> >   - When the downed regionserver comes back up, shouldn't that data be
> >> >   available again? Ours wasn't.
> >> >   - In such scenarios, is there a recommended approach for restoring
> the
> >> >   regionserver that goes down? We just brought them back up by logging
> on
> >> > the
> >> >   node itself an manually restarting them first. Now we have automated
> >> > crons
> >> >   that listen for their ports and restart them if they go down within
> two
> >> >   minutes.
> >> >   - Are there way to recover such lost data?
> >> >   - Are versions 0.89 / 0.90 addressing any of these issues?
> >> >   - Curiosity question: when a regionserver goes down, does the master
> >> try
> >> >   to replicate that node's data on another node to satisfy the
> >> > dfs.replication
> >> >   ratio?
> >> >
> >> > For now, we have upgraded our HBase to 0.20.6, which is supposed to
> >> contain
> >> > the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077>
fix
> >> (but
> >> > no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0
> is
> >> > the
> >> > way to go to avoid the  file append issues but it's not production
> ready
> >> > yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
> >> >
> >> > Any tips here are definitely appreciated. I'll be happy to provide
> more
> >> > information as well.
> >> >
> >> > -GS
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message