Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of ryanobjc@gmail.com designates
 74.125.82.169 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=lwEEOLjRBK76VEypwadIveHikPmHVTlcZfcfTx9h/5PKpRgrbz4n3U0farfINixTai
         WXnaQA/O4DgUj4R+KFcjC2onMEwX9e7xpWqf0N4I05NyKJYNywqCd4l3/uOAAsNd2Izf
         FwKZNfcn2bwuGiCzGIQRoXAyMyZEXyxXcZQqE=
MIME-Version: 1.0
In-Reply-To: <AANLkTikbbUNALPNDuKsnVE1yXFmhivW_yTYytOFwzNbV@mail.gmail.com>
References: <AANLkTikAOzrUD4-oP45D8e_HOqaBJMuMN3hX0e5298Ci@mail.gmail.com>
	<AANLkTim3aFiTqhaJ0ZfvHdvBXhcPChYudJY6XAWmiVN1@mail.gmail.com>
	<AANLkTikoO1c8YBeY8xE1gU9qvUdQviVmjfhxmJHBej75@mail.gmail.com>
	<AANLkTimNHsfFDpYr6DB4GeuQ=qB73R3jV3AG_-aSQjJC@mail.gmail.com>
	<AANLkTinJhVyi6CXax-XtahCa=Jk0KDzETJikwQPkO29O@mail.gmail.com>
	<AANLkTi=LZbNidpRkZJf0eH=Bwscifq7QQ4nv83Lrortv@mail.gmail.com>
	<AANLkTi=qm9B8RfZMY8Nf04WvoZ4Tw5pjMOPZDKneh3FE@mail.gmail.com>
	<AANLkTikbbUNALPNDuKsnVE1yXFmhivW_yTYytOFwzNbV@mail.gmail.com>
Date: Sat, 28 Aug 2010 16:27:10 -0700
Message-ID: <AANLkTin03-zUiT81Ljsz8H7c7PwdudOR5t2gxQiONM0E@mail.gmail.com>
Subject: Re: hbase vs bigtable
From: Ryan Rawson <ryanobjc@gmail.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

One problem of performance right now is our inability to push io down
into the kernel. This is where async Apis help. A full read in hbase
might require reading 10+ files before ever returning a single row.
Doing these in parallel would be nice. Spawning 10+ threads isn't
really a good idea.

Right now hadoop scales by adding processes, we just don't have that option=
.

On Saturday, August 28, 2010, Todd Lipcon <todd@cloudera.com> wrote:
> Agreed, I think we'll get more bang for our buck by finishing up (revivin=
g)
> patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't seem=
 to
> be the highest priority among our customers so it's tough to find much ti=
me
> to work on these things until we really get stability up to par.
>
> -Todd
>
> On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth <jaybooth@gmail.com> wrote:
>
>> I don't think async is a magic bullet for it's own sake, we've all
>> seen those papers that show good performance from blocking
>> implementations. =A0Particularly, I don't think async is worth a whole
>> lot on the client side of service, which HBase is to HDFS.
>>
>> What about an HDFS call for localize(Path) which attempts to replicate
>> the blocks for a file to the local datanode (if any) in a background
>> thread? =A0If RegionServers called that function for their files every
>> so often, you'd eliminate a lot of bandwidth constraints, although the
>> latency of establishing a local socket for every read is still there.
>>
>> On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <ryanobjc@gmail.com> wrot=
e:
>> >
>> >> One thought I had was if we have the writable code, surely just
>> >> putting a different transport around it wouldn't be THAT bad right :-=
)
>> >>
>> >> Of course writables are really tied to that DataInputStream or
>> >> whatever, so we'd have to work on that. =A0Benoit said something abou=
t
>> >> writables needing to do blocking reads and that causing issues, but
>> >> there was a netty3 thing specifically designed to handle that by
>> >> throwing and retrying the op later when there was more data.
>> >>
>> >>
>> > The data transfer protocol actually doesn't do anything with Writables=
 -
>> > it's all hand coded bytes going over the transport.
>> >
>> > I have some code floating around somewhere for translating between
>> blocking
>> > IO and Netty - not sure where, though :)
>> >
>> > -Todd
>> >
>> >
>> >> =A0On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <todd@cloudera.com>
>> wrote:
>> >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <ryanobjc@gmail.com>
>> wrote:
>> >> >
>> >> >> a production server should be CPU bound, with memory caching etc.
>> =A0Our
>> >> >> prod systems do see a reasonable load, and jstack always shows som=
e
>> >> >> kind of wait generally...
>> >> >>
>> >> >> but we need more IO pushdown into HDFS. =A0For example if we are
>> loading
>> >> >> regions, why not do N at the same time? =A0That figure N is probab=
ly
>> >> >> more dependent on how many disks/node you have than anything else
>> >> >> really.
>> >> >>
>> >> >> For simple reads (eg: hfile) would it really be that hard to retro=
fit
>> >> >> some kind of async netty based API on top of the existing DFSClien=
t
>> >> >> logic?
>> >> >>
>> >> >
>> >> > Would probably be a duplication rather than a retrofit, but it's
>> probably
>> >> > doable -- the protocol is pretty simple for reads, and failure/retr=
y
>> is
>> >> much
>> >> > less complicated compared to writes (though still pretty complicate=
d)
>> >> >
>> >> >
>> >> >>
>> >> >> -ryan
>> >> >>
>> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <todd@cloudera.com>
>> wrote:
>> >> >> > Depending on the workload, parallelism doesn't seem to matter mu=
ch.
>> On
>> >> my
>> >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always netwo=
rk
>> >> bound
>> >> >> far
>> >> >> > before I'm CPU bound for most benchmarks. ie jstacks show thread=
s
>> >> mostly
>> >> >> > waiting for IO to happen, not blocked on locks.
>> >> >> >
>> >> >> > Is that not the case for your production boxes?
>> >> >> >
>> >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <--
> Todd Lipcon
> Software Engineer, Cloudera
>