Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates
 209.85.210.194 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=wKe4o5hAbZvdXzyMBoQdHNTq+5yrMQWi5rQCuEHwl44iKPJuzwXwsq0UoLoULRfOJc
         R/OoYfePW+Lk9jBCR+ogQHn23ypF6ZhUM90iMFqHFSaExD4OvnnE7cf6kKPK9c7PWuEw
         EAogwydqpsNo/udfYFU3LD8rcBWQS51+X+re4=
MIME-Version: 1.0
Sender: jdcryans@gmail.com
In-Reply-To: <4aa34eb71001120024v535a84c9kf174c3e38c71c29d@mail.gmail.com>
References: <c93bd2771001111546r60d7f7ddlb7b8acafe0d3bb6@mail.gmail.com>
	 <c93bd2771001111947s1d878aaew9b300979082f1df1@mail.gmail.com>
	 <c93bd2771001112012m4a842070sb1e586b366a48fdc@mail.gmail.com>
	 <4aa34eb71001112225o28f83f6u1eeb7057ed805cc9@mail.gmail.com>
	 <78568af11001112253y5e0a8c64yf2b91ba52f06be19@mail.gmail.com>
	 <4aa34eb71001120024v535a84c9kf174c3e38c71c29d@mail.gmail.com>
Date: Tue, 12 Jan 2010 09:41:44 -0800
Message-ID: <31a243e71001120941n693ec76fmc372676db3247255@mail.gmail.com>
Subject: Re: commit semantics
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: hbase-dev@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

wrt 1 HLog per region server, this is from the Bigtable paper. Their
main concern is the number of opened files since if you have 1000
region servers * 500 regions then you may have 100 000 HLogs to
manage. Also you can have more than one file per HLog, so let's say
you have on average 5 log files per HLog that's 500 000 files on HDFS.

J-D

On Tue, Jan 12, 2010 at 12:24 AM, Dhruba Borthakur <dhruba@gmail.com> wrote:
> Hi Ryan,
>
> thanks for ur response.
>
>>Right now each regionserver has 1 log, so if 2 puts on different
>>tables hit the same RS, they hit the same HLog.
>
> I understand. My point was that the application could insert the same record
> into two different tables on two different Hbase instances on two different
> piece of hardware.
>
> On a related note, can somebody explain what the tradeoff is if each region
> has its own hlog? are you worried about the number of files in HDFS? or
> maybe the number of sync-threads in the region server? Can multiple hlog
> files provide faster region splits?
>
>
>> I've thought about this issue quite a bit, and I think the sync every
>> 1 rows combined with optional no-sync and low time sync() is the way
>> to go. If you want to discuss this more in person, maybe we can meet
>> up for brews or something.
>>
>
> The group-commit thing I can understand. HDFS does a very similar thing. But
> can you explain your alternative "sync every 1 rows combined with optional
> no-sync and low time sync"? For those applications that have the natural
> characteristics of updating only one row per logical operation, how can they
> be sure that their data has reached some-sort-of-stable-storage unless they
> sync after every row update?
>
> thanks,
> dhruba
>