Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 23860 invoked from network); 12 Jan 2010 17:42:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jan 2010 17:42:12 -0000 Received: (qmail 54536 invoked by uid 500); 12 Jan 2010 17:42:12 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 54509 invoked by uid 500); 12 Jan 2010 17:42:12 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 54499 invoked by uid 99); 12 Jan 2010 17:42:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 17:42:12 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.210.194 as permitted sender) Received: from [209.85.210.194] (HELO mail-yx0-f194.google.com) (209.85.210.194) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 17:42:05 +0000 Received: by yxe32 with SMTP id 32so21982015yxe.5 for ; Tue, 12 Jan 2010 09:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=3mW6arAfJI8XqnI5CjAFgmXNQ0ngevjul+Fg9emMoEc=; b=QLVxBeyCFnoy7preY20QYtR+lG3at8JWwX7dv/Cmi/052BN4XqGZiyi3vAxLEdAXWz 4tDc9GifbB0ccE9yMWqAYTeWY6k/9viwm4/JQHNyapq8IERmffIyJ+B5pIvJexseysQI NGR7iJOmkAbvACdsAjyMR/gIhGv9SeBhlBPFs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=wKe4o5hAbZvdXzyMBoQdHNTq+5yrMQWi5rQCuEHwl44iKPJuzwXwsq0UoLoULRfOJc R/OoYfePW+Lk9jBCR+ogQHn23ypF6ZhUM90iMFqHFSaExD4OvnnE7cf6kKPK9c7PWuEw EAogwydqpsNo/udfYFU3LD8rcBWQS51+X+re4= MIME-Version: 1.0 Sender: jdcryans@gmail.com Received: by 10.91.42.29 with SMTP id u29mr6626972agj.5.1263318104518; Tue, 12 Jan 2010 09:41:44 -0800 (PST) In-Reply-To: <4aa34eb71001120024v535a84c9kf174c3e38c71c29d@mail.gmail.com> References: <4aa34eb71001112225o28f83f6u1eeb7057ed805cc9@mail.gmail.com> <78568af11001112253y5e0a8c64yf2b91ba52f06be19@mail.gmail.com> <4aa34eb71001120024v535a84c9kf174c3e38c71c29d@mail.gmail.com> Date: Tue, 12 Jan 2010 09:41:44 -0800 X-Google-Sender-Auth: bbe8fd5579b65a0a Message-ID: <31a243e71001120941n693ec76fmc372676db3247255@mail.gmail.com> Subject: Re: commit semantics From: Jean-Daniel Cryans To: hbase-dev@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 wrt 1 HLog per region server, this is from the Bigtable paper. Their main concern is the number of opened files since if you have 1000 region servers * 500 regions then you may have 100 000 HLogs to manage. Also you can have more than one file per HLog, so let's say you have on average 5 log files per HLog that's 500 000 files on HDFS. J-D On Tue, Jan 12, 2010 at 12:24 AM, Dhruba Borthakur wrote: > Hi Ryan, > > thanks for ur response. > >>Right now each regionserver has 1 log, so if 2 puts on different >>tables hit the same RS, they hit the same HLog. > > I understand. My point was that the application could insert the same record > into two different tables on two different Hbase instances on two different > piece of hardware. > > On a related note, can somebody explain what the tradeoff is if each region > has its own hlog? are you worried about the number of files in HDFS? or > maybe the number of sync-threads in the region server? Can multiple hlog > files provide faster region splits? > > >> I've thought about this issue quite a bit, and I think the sync every >> 1 rows combined with optional no-sync and low time sync() is the way >> to go. If you want to discuss this more in person, maybe we can meet >> up for brews or something. >> > > The group-commit thing I can understand. HDFS does a very similar thing. But > can you explain your alternative "sync every 1 rows combined with optional > no-sync and low time sync"? For those applications that have the natural > characteristics of updating only one row per logical operation, how can they > be sure that their data has reached some-sort-of-stable-storage unless they > sync after every row update? > > thanks, > dhruba >