Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of dhruba@gmail.com designates
 209.85.210.194 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type;
        b=cm1LsY0WlGzWW+GSMKfFo0uTxGbrfKxuh8HmlGHFXCIFUfLmy1+eIFz00lhA2z6qC5
         l2EjmaxA3IP/eQjWx2foXtjj+Nr6+D8uC9yCHCl8pzCakzbfEXlRJYvBVkUCUOxHF/j9
         dINm+NXVgqr4EVM7aeTASSkZp9FAEPGRpGbeI=
MIME-Version: 1.0
In-Reply-To: <7c962aed1001120958i7ae4d10uaec745b44977597f@mail.gmail.com>
References: <c93bd2771001111546r60d7f7ddlb7b8acafe0d3bb6@mail.gmail.com>
	 <c93bd2771001111947s1d878aaew9b300979082f1df1@mail.gmail.com>
	 <c93bd2771001112012m4a842070sb1e586b366a48fdc@mail.gmail.com>
	 <4aa34eb71001112225o28f83f6u1eeb7057ed805cc9@mail.gmail.com>
	 <7c962aed1001120958i7ae4d10uaec745b44977597f@mail.gmail.com>
Date: Tue, 12 Jan 2010 10:14:07 -0800
Message-ID: <4aa34eb71001121014w7e47baa0vb83e5c58f74f1a7d@mail.gmail.com>
Subject: Re: commit semantics
From: Dhruba Borthakur <dhruba@gmail.com>
To: hbase-dev@hadoop.apache.org
Cc: kannan@facebook.com, Dhruba Borthakur <dhruba@facebook.com>
Content-Type: multipart/alternative; boundary=000e0cd48b66a76a30047cfb9e86

--000e0cd48b66a76a30047cfb9e86
Content-Type: text/plain; charset=ISO-8859-1

Hi stack,

I was meaning "what if the application inserted the same record into two
Hbase instances"? Of course, now the onus is on the appl to keep both of
them in sync and recover from any inconsistencies between them.

thanks,
dhruba

On Tue, Jan 12, 2010 at 9:58 AM, stack <stack@duboce.net> wrote:

> On Mon, Jan 11, 2010 at 10:25 PM, Dhruba Borthakur <dhruba@gmail.com>
> wrote:
>
> > if we want the best of both worlds.. latency as well as data integrity,
> how
> > about inserting the same record into two completely separate HBase tables
> > in
> > parallel... the operation can complete as soon as the record is inserted
> > into the first HBase table (thus giving low latencies)
>
>
> Return after insert into the first table?  Then internally hbase is meant
> to
> take care of the insert into the second table?  What if the latter fails
> for
> some reason other than regionserver crash?
>
> The two writes would have to be done as hdfs does, in series, if the two
> tables are to remain in sync, with the addition of a roll back of the
> transaction if insert does not go through to both tables since we don't
> have
> something like the hdfs background thread ensuring replica counts.
>
>
> > but data integrity
> > will not be compromised because it is unlikely that two region servers
> will
> > fail exactly at the same time (assuming that there is a way to ensure
> that
> > these two tables are not handled by the same region server).
> >
>
> How do you suggest the application deal with reading from these two tables?
> If they are guaranteed in-sync, then it could pick either.  If the two can
> wander, then the application needs to read from both and make
> reconciliation
> somehow?
>
> Just trying to understand what you are suggesting Dhruba,
> St.Ack
>
>
>
> >
> > thanks,
> > dhruba
> >
> >
> > On Mon, Jan 11, 2010 at 8:12 PM, Joydeep Sarma <jssarma@apache.org>
> wrote:
> >
> > > ok - hadn't thought about it that way - but yeah with a default of 1 -
> > > the semantics seem correct.
> > >
> > > under high load - some batching would automatically happen at this
> > > setting (or so one would think - not sure if hdfs appends are blocked
> > > on pending syncs (in which case the batching wouldn't quite happen i
> > > think) - cc'ing Dhruba).
> > >
> > > if the performance with setting of 1 doesn't work out - we may need an
> > > option to delay acks until actual syncs .. (most likely we would be
> > > able to compromise on latency to get higher throughput - but wouldn't
> > > be willing to compromise on data integrity)
> > >
> > > > Hey Joydeep,
> > > >
> > > > This is actually intended this way but the name of the variable is
> > > > misleading. The sync is done only if forceSync or we have enough
> > > > entries to sync (default is 1). If someone wants to sync only 100
> > > > entries for example, they would play with that configuration.
> > > >
> > > > Hope that helps,
> > > >
> > > > J-D
> > > >
> > > >
> > > > On Mon, Jan 11, 2010 at 3:46 PM, Joydeep Sarma <jssarma@apache.org>
> > > wrote:
> > > >>
> > > >> Hey HBase-devs,
> > > >>
> > > >> we have been going through hbase code to come up to speed.
> > > >>
> > > >> One of the questions was regarding the commit semantics. Thumbing
> > > through the RegionServer code that's appending to the wal:
> > > >>
> > > >> syncWal -> HLog.sync -> addToSyncQueue ->syncDone.await()
> > > >>
> > > >> and the log writer thread calls:
> > > >>
> > > >> hflush(), syncDone.signalAll()
> > > >>
> > > >> however hflush doesn't necessarily call a sync on the underlying log
> > > file:
> > > >>
> > > >>       if (this.forceSync ||
> > > >>           this.unflushedEntries.get() >= this.flushlogentries) { ...
> > > sync() ... }
> > > >>
> > > >> so it seems that if forceSync is not true, the syncWal can unblock
> > > before a sync is called (and forcesync seems to be only true for
> > > metaregion()).
> > > >>
> > > >> are we missing something - or is there a bug here (the signalAll
> > should
> > > be conditional on hflush having actually flushed something).
> > > >>
> > > >> thanks,
> > > >>
> > > >> Joydeep
> > > >
> > >
> >
> >
> >
> > --
> > Connect to me at http://www.facebook.com/dhruba
> >
>


-- 
Connect to me at http://www.facebook.com/dhruba

--000e0cd48b66a76a30047cfb9e86--