Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 46221 invoked from network); 12 Jan 2010 18:14:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jan 2010 18:14:47 -0000 Received: (qmail 9794 invoked by uid 500); 12 Jan 2010 18:14:46 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 9752 invoked by uid 500); 12 Jan 2010 18:14:46 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 9742 invoked by uid 99); 12 Jan 2010 18:14:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 18:14:46 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dhruba@gmail.com designates 209.85.210.194 as permitted sender) Received: from [209.85.210.194] (HELO mail-yx0-f194.google.com) (209.85.210.194) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 18:14:37 +0000 Received: by yxe32 with SMTP id 32so22014963yxe.5 for ; Tue, 12 Jan 2010 10:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=5kOjDH0vpO/kpT76qwu83L03v/rXGlzmIIUvboen1BM=; b=AOHCQk3R96mYcZdTtpT5bWOh6lwJ5a9mMmhKpMGEX2L69WKxFaqOZU2TFzyaNNUMcU +zvCBPgcCYleIMEWp0bMV6PJ0hRmDLZvZwrt6eOz5M6NQf+fmIzQafvMS2tkn5sk21yA BAC9FkNtMCm4BMAzBU+KpqCuV/ZjToMEjXdC4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=cm1LsY0WlGzWW+GSMKfFo0uTxGbrfKxuh8HmlGHFXCIFUfLmy1+eIFz00lhA2z6qC5 l2EjmaxA3IP/eQjWx2foXtjj+Nr6+D8uC9yCHCl8pzCakzbfEXlRJYvBVkUCUOxHF/j9 dINm+NXVgqr4EVM7aeTASSkZp9FAEPGRpGbeI= MIME-Version: 1.0 Received: by 10.150.81.4 with SMTP id e4mr1572361ybb.345.1263320047249; Tue, 12 Jan 2010 10:14:07 -0800 (PST) In-Reply-To: <7c962aed1001120958i7ae4d10uaec745b44977597f@mail.gmail.com> References: <4aa34eb71001112225o28f83f6u1eeb7057ed805cc9@mail.gmail.com> <7c962aed1001120958i7ae4d10uaec745b44977597f@mail.gmail.com> Date: Tue, 12 Jan 2010 10:14:07 -0800 Message-ID: <4aa34eb71001121014w7e47baa0vb83e5c58f74f1a7d@mail.gmail.com> Subject: Re: commit semantics From: Dhruba Borthakur To: hbase-dev@hadoop.apache.org Cc: kannan@facebook.com, Dhruba Borthakur Content-Type: multipart/alternative; boundary=000e0cd48b66a76a30047cfb9e86 --000e0cd48b66a76a30047cfb9e86 Content-Type: text/plain; charset=ISO-8859-1 Hi stack, I was meaning "what if the application inserted the same record into two Hbase instances"? Of course, now the onus is on the appl to keep both of them in sync and recover from any inconsistencies between them. thanks, dhruba On Tue, Jan 12, 2010 at 9:58 AM, stack wrote: > On Mon, Jan 11, 2010 at 10:25 PM, Dhruba Borthakur > wrote: > > > if we want the best of both worlds.. latency as well as data integrity, > how > > about inserting the same record into two completely separate HBase tables > > in > > parallel... the operation can complete as soon as the record is inserted > > into the first HBase table (thus giving low latencies) > > > Return after insert into the first table? Then internally hbase is meant > to > take care of the insert into the second table? What if the latter fails > for > some reason other than regionserver crash? > > The two writes would have to be done as hdfs does, in series, if the two > tables are to remain in sync, with the addition of a roll back of the > transaction if insert does not go through to both tables since we don't > have > something like the hdfs background thread ensuring replica counts. > > > > but data integrity > > will not be compromised because it is unlikely that two region servers > will > > fail exactly at the same time (assuming that there is a way to ensure > that > > these two tables are not handled by the same region server). > > > > How do you suggest the application deal with reading from these two tables? > If they are guaranteed in-sync, then it could pick either. If the two can > wander, then the application needs to read from both and make > reconciliation > somehow? > > Just trying to understand what you are suggesting Dhruba, > St.Ack > > > > > > > thanks, > > dhruba > > > > > > On Mon, Jan 11, 2010 at 8:12 PM, Joydeep Sarma > wrote: > > > > > ok - hadn't thought about it that way - but yeah with a default of 1 - > > > the semantics seem correct. > > > > > > under high load - some batching would automatically happen at this > > > setting (or so one would think - not sure if hdfs appends are blocked > > > on pending syncs (in which case the batching wouldn't quite happen i > > > think) - cc'ing Dhruba). > > > > > > if the performance with setting of 1 doesn't work out - we may need an > > > option to delay acks until actual syncs .. (most likely we would be > > > able to compromise on latency to get higher throughput - but wouldn't > > > be willing to compromise on data integrity) > > > > > > > Hey Joydeep, > > > > > > > > This is actually intended this way but the name of the variable is > > > > misleading. The sync is done only if forceSync or we have enough > > > > entries to sync (default is 1). If someone wants to sync only 100 > > > > entries for example, they would play with that configuration. > > > > > > > > Hope that helps, > > > > > > > > J-D > > > > > > > > > > > > On Mon, Jan 11, 2010 at 3:46 PM, Joydeep Sarma > > > wrote: > > > >> > > > >> Hey HBase-devs, > > > >> > > > >> we have been going through hbase code to come up to speed. > > > >> > > > >> One of the questions was regarding the commit semantics. Thumbing > > > through the RegionServer code that's appending to the wal: > > > >> > > > >> syncWal -> HLog.sync -> addToSyncQueue ->syncDone.await() > > > >> > > > >> and the log writer thread calls: > > > >> > > > >> hflush(), syncDone.signalAll() > > > >> > > > >> however hflush doesn't necessarily call a sync on the underlying log > > > file: > > > >> > > > >> if (this.forceSync || > > > >> this.unflushedEntries.get() >= this.flushlogentries) { ... > > > sync() ... } > > > >> > > > >> so it seems that if forceSync is not true, the syncWal can unblock > > > before a sync is called (and forcesync seems to be only true for > > > metaregion()). > > > >> > > > >> are we missing something - or is there a bug here (the signalAll > > should > > > be conditional on hflush having actually flushed something). > > > >> > > > >> thanks, > > > >> > > > >> Joydeep > > > > > > > > > > > > > > > -- > > Connect to me at http://www.facebook.com/dhruba > > > -- Connect to me at http://www.facebook.com/dhruba --000e0cd48b66a76a30047cfb9e86--