hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Hbase performance with HDFS
Date Mon, 11 Jul 2011 16:47:57 GMT
Also, on MapR, you get another level of group commit above the row level.
 That takes the writes even further from the byte by byte level.

On Mon, Jul 11, 2011 at 9:20 AM, Andrew Purtell <apurtell@apache.org> wrote:

> > Despite having support for append in HDFS, it is still expensive to
> > update it on every byte and here is where the wal flushing policies come
> > in.
>
> Right, but a minor correction here. HBase doesn't flush the WAL per byte.
> We do a "group commit" of all changes to a row, to the extent the user has
> grouped changes to the row into a Put. So at the least this is first a write
> of all the bytes of an edit, or it could be more than one edit if we can
> group them, and _then_ a sync.
>
>
> Also most who run HBase run a HDFS patched with HDFS-895, so multiple syncs
> can be in flight. This does not reduce the added latency of a sync for the
> current writer but it does significantly reduce the expense of the sync with
> respect to other parallel writers.
>
>
> Best regards,
>
>
>        - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
> ----- Original Message -----
> > From: Arvind Jayaprakash <work@anomalizer.net>
> > To: user@hbase.apache.org; Andrew Purtell <apurtell@apache.org>
> > Cc:
> > Sent: Monday, July 11, 2011 6:34 AM
> > Subject: Re: Hbase performance with HDFS
> >
> > On Jul 07, Andrew Purtell wrote:
> >>>  Since HDFS is mostly write once how are updates/deletes handled?
> >>
> >> Not mostly, only write once.
> >>
> >> Deletes are just another write, but one that writes tombstones
> >> "covering" data with older timestamps.
> >>
> >> When serving queries, HBase searches store files back in time until it
> >> finds data at the coordinates requested or a tombstone.
> >>
> >> The process of compaction not only merge sorts a bunch of accumulated
> >> store files (from flushes) into fewer store files (or one) for read
> >> efficiency, it also performs housekeeping, dropping data "covered"
> > by
> >> the delete tombstones. Incidentally this is also how TTLs are
> >> supported: expired values are dropped as well.
> >
> > Just wanted to talk about WAL. My understanding is that updates are
> > journalled onto HDFS by sequentially recording them as they happen per
> > region. This is where the need for HDFS append comes in, something that
> > I don't recollect seeing in the GFS paper.
> >
> > Despite having support for append in HDFS, it is still expensive to
> > update it on every byte and here is where the wal flushing policies come
> > in.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message