Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70635100BE for ; Mon, 14 Apr 2014 23:54:53 +0000 (UTC) Received: (qmail 50641 invoked by uid 500); 14 Apr 2014 23:54:50 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 50570 invoked by uid 500); 14 Apr 2014 23:54:50 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 50562 invoked by uid 99); 14 Apr 2014 23:54:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 23:54:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vladrodionov@gmail.com designates 74.125.82.46 as permitted sender) Received: from [74.125.82.46] (HELO mail-wg0-f46.google.com) (74.125.82.46) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 23:54:44 +0000 Received: by mail-wg0-f46.google.com with SMTP id b13so8702170wgh.17 for ; Mon, 14 Apr 2014 16:54:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=cagOuorHGFj79J4Zwkp/09m/x2mCEgqwwPRJpl9VrLY=; b=GZ5UEbkDivsnNGtbPxWJXpME+AtUgGu6jDRf4m3dijWWO6kUgSt2AJvVmL0+6YWIa1 nup+dQA01zXE/TIeKwlIT/u/oUyu4GZqf5Im/yTR6gUOslxIE8qk/RA2ZfwkWuXMvdor d6Fw+CcBSUzO5L9CQbuEEfW6mjpGDkuxZTvbR+WzvvCi2/Yr6gF4TKbwaGf102GnXq7y 7BCHYyM4LlzboNpNdx2GA+M3HDUL2/R11zk/1Bh3ePI/YvCgefaT/Bte+VvGZBxasP7D GUrusvDMSIitKyCNZGEL9rrZayEdd46gwFG/dIOoBjGZnPTK0sophzeVbArIek1XWiuR /phQ== MIME-Version: 1.0 X-Received: by 10.180.78.225 with SMTP id e1mr11715698wix.17.1397519663800; Mon, 14 Apr 2014 16:54:23 -0700 (PDT) Received: by 10.217.55.3 with HTTP; Mon, 14 Apr 2014 16:54:23 -0700 (PDT) In-Reply-To: References: Date: Mon, 14 Apr 2014 16:54:23 -0700 Message-ID: Subject: Re: HBase region server failure issues From: Vladimir Rodionov To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=f46d043c086420a8e604f7096750 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c086420a8e604f7096750 Content-Type: text/plain; charset=UTF-8 I do not think its a good idea to have one WAL file per region. All WAL file idea is based on assumption that writing data sequentially reduces average latency and increases total throughput. This is no longer a case in a one WAL file per region approach, you may have hundreds active regions per RS and all sequential writes become random ones and random IO for rotational media is very bad, very bad. -Vladimir Rodionov On Mon, Apr 14, 2014 at 2:41 PM, Ted Yu wrote: > There is on-going effort to address this issue. > > See the following: > HBASE-8610 Introduce interfaces to support MultiWAL > HBASE-10378 Divide HLog interface into User and Implementor specific > interfaces > > Cheers > > > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu wrote: > > > Hi all, > > > > My name is Claudiu Soroiu and I am new to hbase/hadoop but not new to > > distributed computing in FT/HA environments and I see there are a lot of > > issues reported related to the region server failure. > > > > The main problem I see it is related to recovery time in case of a node > > failure and distributed log splitting. After some tunning I managed to > > reduce it to 8 seconds in total and for the moment it fits the needs. > > > > I have one question: *Why there is only one WAL file per region server > and > > not one WAL per region itself? * > > I haven't found the exact answer anywhere, that's why i'm asking on this > > list and please point me to the right direction if i missed the list. > > > > My point is that eliminating the need of splitting a log in case of > failure > > reduces the downtime for the regions and the only delay that we will see > > will be related to transferring data over network to the region servers > > that will take over the failed regions. > > This is feasible only if having multiple WAL's per Region Server does not > > affect the overall write performance. > > > > Thanks, > > Claudiu > > > --f46d043c086420a8e604f7096750--