Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86378108BE for ; Tue, 15 Apr 2014 03:22:57 +0000 (UTC) Received: (qmail 45785 invoked by uid 500); 15 Apr 2014 03:22:55 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 44979 invoked by uid 500); 15 Apr 2014 03:22:52 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 44969 invoked by uid 99); 15 Apr 2014 03:22:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2014 03:22:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of todd@cloudera.com designates 209.85.215.54 as permitted sender) Received: from [209.85.215.54] (HELO mail-la0-f54.google.com) (209.85.215.54) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2014 03:22:43 +0000 Received: by mail-la0-f54.google.com with SMTP id mc6so6333820lab.41 for ; Mon, 14 Apr 2014 20:22:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=C7p/pJuVOInBWAjG6KE2DReGvYFzzTSYD0zTE7DGurQ=; b=c9GxTuQk0bbdSDALaakqB8RaUX5RZY6H1ZTewKXoL89BLPSQgCj7ya6Ny6s5vkzGSw iXLnAkyz0z7mWiFBYwM2/buTlxbLpx6/34uohjlaS7V4idtv+qoeWGITg4mZKw8LRaD6 Lv1JFuM2t/y2MJ+RgiyyyUWmeaQwAMhhLzZ2wcDHUJXPw2r1aGLX6VM16ViAVDgpb1OZ Jkdqf2zjcPIKYT6QwCeVn3oqHQq/w9aKXdbt4L6ZC+eTK4hgrj8wEPqSghI0CdJlJgr0 qY8zQh8i9bo1hQTwdyy8gQ+YFQwASx6AnIMjBmMHFys5DrHVJGLX0g6URiJnXrIBm1+E aVcg== X-Gm-Message-State: ALoCoQmfk+MbOL2H02juy+F2VIXKU2BGiZpJ5koAwEsmw/6FpRaMhGz/M6enFq7ALvAysVPE0pjn X-Received: by 10.112.52.104 with SMTP id s8mr30818889lbo.7.1397532141561; Mon, 14 Apr 2014 20:22:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.242.162 with HTTP; Mon, 14 Apr 2014 20:22:01 -0700 (PDT) In-Reply-To: References: From: Todd Lipcon Date: Mon, 14 Apr 2014 20:22:01 -0700 Message-ID: Subject: Re: HBase region server failure issues To: dev Content-Type: multipart/alternative; boundary=001a11c3f020dc3ae404f70c4eed X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3f020dc3ae404f70c4eed Content-Type: text/plain; charset=ISO-8859-1 On Mon, Apr 14, 2014 at 6:32 PM, Vladimir Rodionov wrote: > *On the other hand, 95% of HBase users don't actually configure HDFS to > fsync() every edit. Given that, the random writes aren't actually going to > cause one seek per write -- they'll get buffered up and written back > periodically in a much more efficient fashion.* > > Todd, this is in theory. Reality is different. 1 writer is definitely more > efficient than 100. This won't scale well. > I'd actually disagree. 100 is probably significantly faster than 1, given that most machines have 12 spindles. So, yes, you'd be multiplexing 8 or so logs per spindle, but even 100 logs only requires a few hundred MB worth of buffer cache in order to get good coalescing of writes into large physical IOs. If memory is really constrained on your machine, you'll probably get some throughput collapse as you enter some really inefficient dirty throttling, but so long as you leave a few GB unallocated, I bet the reality is much closer to what I said than you might think. -Todd > > > On Mon, Apr 14, 2014 at 6:20 PM, Todd Lipcon wrote: > > > On the other hand, 95% of HBase users don't actually configure HDFS to > > fsync() every edit. Given that, the random writes aren't actually going > to > > cause one seek per write -- they'll get buffered up and written back > > periodically in a much more efficient fashion. > > > > Plus, in some small number of years, I believe SSDs will be available on > > most server machines (in a hybrid configuration) so the seeks will cost > > less even with fsync on. > > > > -Todd > > > > > > On Mon, Apr 14, 2014 at 4:54 PM, Vladimir Rodionov > > wrote: > > > > > I do not think its a good idea to have one WAL file per region. All WAL > > > file idea is based on assumption that writing data sequentially > reduces > > > average latency and increases total throughput. This is no longer a > case > > in > > > a one WAL file per region approach, you may have hundreds active > regions > > > per RS and all sequential writes become random ones and random IO for > > > rotational media is very bad, very bad. > > > > > > -Vladimir Rodionov > > > > > > > > > > > > On Mon, Apr 14, 2014 at 2:41 PM, Ted Yu wrote: > > > > > > > There is on-going effort to address this issue. > > > > > > > > See the following: > > > > HBASE-8610 Introduce interfaces to support MultiWAL > > > > HBASE-10378 Divide HLog interface into User and Implementor specific > > > > interfaces > > > > > > > > Cheers > > > > > > > > > > > > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > My name is Claudiu Soroiu and I am new to hbase/hadoop but not new > to > > > > > distributed computing in FT/HA environments and I see there are a > lot > > > of > > > > > issues reported related to the region server failure. > > > > > > > > > > The main problem I see it is related to recovery time in case of a > > node > > > > > failure and distributed log splitting. After some tunning I managed > > to > > > > > reduce it to 8 seconds in total and for the moment it fits the > needs. > > > > > > > > > > I have one question: *Why there is only one WAL file per region > > server > > > > and > > > > > not one WAL per region itself? * > > > > > I haven't found the exact answer anywhere, that's why i'm asking on > > > this > > > > > list and please point me to the right direction if i missed the > list. > > > > > > > > > > My point is that eliminating the need of splitting a log in case of > > > > failure > > > > > reduces the downtime for the regions and the only delay that we > will > > > see > > > > > will be related to transferring data over network to the region > > servers > > > > > that will take over the failed regions. > > > > > This is feasible only if having multiple WAL's per Region Server > does > > > not > > > > > affect the overall write performance. > > > > > > > > > > Thanks, > > > > > Claudiu > > > > > > > > > > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > -- Todd Lipcon Software Engineer, Cloudera --001a11c3f020dc3ae404f70c4eed--