Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 13798 invoked from network); 29 Dec 2010 19:32:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Dec 2010 19:32:48 -0000 Received: (qmail 25899 invoked by uid 500); 29 Dec 2010 19:32:48 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 25851 invoked by uid 500); 29 Dec 2010 19:32:47 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 25843 invoked by uid 99); 29 Dec 2010 19:32:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Dec 2010 19:32:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Dec 2010 19:32:41 +0000 Received: by fxm12 with SMTP id 12so4242022fxm.14 for ; Wed, 29 Dec 2010 11:32:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=Wrg7CCQfsMja3BfY6YaL/AZgEXJOt8RHiuWW6Bdc+34=; b=kqC9hNgDjQYOl+JhFqSExP+fhSbey+2krWIfcWijr8AP3PD0ijxxyTVkpd8KkRTTNg +EZQP41/anPdzoHHcibuBPBxhjFMIG3OFqGc3p9ftxnXsxkOWFhGkHqOKnjBBKxxrHXb NdgKXX+7Pxu9i1LbylJya/Bn1QPPFctKs9Lrw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=AOVbUrFexZoDKebhooEoECyfCHN8MZ4w18XG3H0fYcsolGFka0eWuzpE7D9Y/OWZnv ktW7JvG27O7gluIw+gZjTptYGvVr+bbEQO9+e4nGoVobppEOQQMfo80JDsOn6vgbA5tD vPiPHsRFKMdv68h5r0BpQMHz/kvTaycGDAvIw= MIME-Version: 1.0 Received: by 10.223.110.77 with SMTP id m13mr1243492fap.86.1293651139529; Wed, 29 Dec 2010 11:32:19 -0800 (PST) Sender: saint.ack@gmail.com Received: by 10.223.83.9 with HTTP; Wed, 29 Dec 2010 11:32:19 -0800 (PST) In-Reply-To: References: Date: Wed, 29 Dec 2010 11:32:19 -0800 X-Google-Sender-Auth: YBUvFoKx0wsasQp2D6JWMhowsr8 Message-ID: Subject: Re: Good VLDB paper on WALs From: Stack To: dev@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Nice list of things we need to do to make logging faster (with useful citations on current state of art). This notion of early lock release (ELR) is worth looking into (Jon, for high rates of counter transactions, you've been talking about aggregating counts in front of the WAL lock... maybe an ELR and then a hold on the transaction until confirmation of flush would be way to go?). Regards flush-pipelining, it would be interesting to see if there are traces of the sys-time that Dhruba is seeing in his NN out in HBase servers. My guess is that its probably drowned by other context switches done in our servers. Definitely worth study. St.Ack P.S. Minimizing context switches, a system for ELR and flush-pipelining, recasting the server to make use of one of the DI or OSGi frameworks, moving off log4j, etc..... Is it just me or do others feel a server rewrite coming on? On Mon, Dec 27, 2010 at 11:48 AM, Dhruba Borthakur wrote= : > HDFS currently uses Hadoop RPC and the server thread blocks till the WAL = is > written to disk. In earlier deployments, I thought we could safely ignore > flush-pipelining by creating more server threads. But in our largest HDFS > systems, I am starting to see =A020% sys-time usage on the namenode machi= ne; > most of this =A0could be thread scheduling. If so, then it makes sense to > enhance the logging code to release server threads even before the WAL is > flushed to disk (but, of course, we still have to delay the transaction > response to the client till the WAL is synced to disk). > > Does anybody have any idea on how to figure out what percentage of the ab= ove > sys-time is spent in thread scheduling vs the time spent in other system > calls (especially in the Namenode context)? > > thanks, > dhruba > > > On Fri, Dec 24, 2010 at 8:17 PM, Todd Lipcon wrote: > >> Via Hammer - I thought this was a pretty good read, some good ideas for >> optimizations for our WAL. >> >> http://infoscience.epfl.ch/record/149436/files/vldb10aether.pdf >> >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Connect to me at http://www.facebook.com/dhruba >