Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 47311 invoked from network); 28 Aug 2010 23:27:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Aug 2010 23:27:38 -0000 Received: (qmail 64951 invoked by uid 500); 28 Aug 2010 23:27:38 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 64796 invoked by uid 500); 28 Aug 2010 23:27:37 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 64787 invoked by uid 99); 28 Aug 2010 23:27:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Aug 2010 23:27:37 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ryanobjc@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-wy0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Aug 2010 23:27:31 +0000 Received: by wyb36 with SMTP id 36so6242175wyb.14 for ; Sat, 28 Aug 2010 16:27:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=TvAiJHpHmKX56x2xrfCY2K0hAWSB4j99wJ9+3KZiYNM=; b=CZTY/nobHf2jacjIr1vnc8QS7IKgTpq/+6RcbNxkQQM0GKW2tjqV2eTWpFGGoSh9zh E5RcH9sCAWq+impL2iGP3q5v9zCqAu8wOR2aiDg0g0PspJS8sNUGbp4hLnwR+o/K+jV5 4vy1z5x+/oiUH664ec6kRmRQI1wHqBaMwpfgY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=lwEEOLjRBK76VEypwadIveHikPmHVTlcZfcfTx9h/5PKpRgrbz4n3U0farfINixTai WXnaQA/O4DgUj4R+KFcjC2onMEwX9e7xpWqf0N4I05NyKJYNywqCd4l3/uOAAsNd2Izf FwKZNfcn2bwuGiCzGIQRoXAyMyZEXyxXcZQqE= MIME-Version: 1.0 Received: by 10.227.68.149 with SMTP id v21mr2763921wbi.138.1283038030449; Sat, 28 Aug 2010 16:27:10 -0700 (PDT) Received: by 10.216.160.71 with HTTP; Sat, 28 Aug 2010 16:27:10 -0700 (PDT) In-Reply-To: References: Date: Sat, 28 Aug 2010 16:27:10 -0700 Message-ID: Subject: Re: hbase vs bigtable From: Ryan Rawson To: "dev@hbase.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable One problem of performance right now is our inability to push io down into the kernel. This is where async Apis help. A full read in hbase might require reading 10+ files before ever returning a single row. Doing these in parallel would be nice. Spawning 10+ threads isn't really a good idea. Right now hadoop scales by adding processes, we just don't have that option= . On Saturday, August 28, 2010, Todd Lipcon wrote: > Agreed, I think we'll get more bang for our buck by finishing up (revivin= g) > patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't seem= to > be the highest priority among our customers so it's tough to find much ti= me > to work on these things until we really get stability up to par. > > -Todd > > On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth wrote: > >> I don't think async is a magic bullet for it's own sake, we've all >> seen those papers that show good performance from blocking >> implementations. =A0Particularly, I don't think async is worth a whole >> lot on the client side of service, which HBase is to HDFS. >> >> What about an HDFS call for localize(Path) which attempts to replicate >> the blocks for a file to the local datanode (if any) in a background >> thread? =A0If RegionServers called that function for their files every >> so often, you'd eliminate a lot of bandwidth constraints, although the >> latency of establishing a local socket for every read is still there. >> >> On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon wrote: >> > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson wrot= e: >> > >> >> One thought I had was if we have the writable code, surely just >> >> putting a different transport around it wouldn't be THAT bad right :-= ) >> >> >> >> Of course writables are really tied to that DataInputStream or >> >> whatever, so we'd have to work on that. =A0Benoit said something abou= t >> >> writables needing to do blocking reads and that causing issues, but >> >> there was a netty3 thing specifically designed to handle that by >> >> throwing and retrying the op later when there was more data. >> >> >> >> >> > The data transfer protocol actually doesn't do anything with Writables= - >> > it's all hand coded bytes going over the transport. >> > >> > I have some code floating around somewhere for translating between >> blocking >> > IO and Netty - not sure where, though :) >> > >> > -Todd >> > >> > >> >> =A0On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon >> wrote: >> >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson >> wrote: >> >> > >> >> >> a production server should be CPU bound, with memory caching etc. >> =A0Our >> >> >> prod systems do see a reasonable load, and jstack always shows som= e >> >> >> kind of wait generally... >> >> >> >> >> >> but we need more IO pushdown into HDFS. =A0For example if we are >> loading >> >> >> regions, why not do N at the same time? =A0That figure N is probab= ly >> >> >> more dependent on how many disks/node you have than anything else >> >> >> really. >> >> >> >> >> >> For simple reads (eg: hfile) would it really be that hard to retro= fit >> >> >> some kind of async netty based API on top of the existing DFSClien= t >> >> >> logic? >> >> >> >> >> > >> >> > Would probably be a duplication rather than a retrofit, but it's >> probably >> >> > doable -- the protocol is pretty simple for reads, and failure/retr= y >> is >> >> much >> >> > less complicated compared to writes (though still pretty complicate= d) >> >> > >> >> > >> >> >> >> >> >> -ryan >> >> >> >> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon >> wrote: >> >> >> > Depending on the workload, parallelism doesn't seem to matter mu= ch. >> On >> >> my >> >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always netwo= rk >> >> bound >> >> >> far >> >> >> > before I'm CPU bound for most benchmarks. ie jstacks show thread= s >> >> mostly >> >> >> > waiting for IO to happen, not blocked on locks. >> >> >> > >> >> >> > Is that not the case for your production boxes? >> >> >> > >> >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <-- > Todd Lipcon > Software Engineer, Cloudera >