hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Storing lots of raw log data in HBase
Date Tue, 16 Feb 2010 00:52:00 GMT
Most log data tends to be time-oriented, thus the 'natural' schema is
to use the timestamp as the row key, thus concentrating all inserts on
a single region and thus node.  This is fixable by changing the key to
something other than a monotonically increasing value.

If you just insert on 1 region, you end up being gated by the
performance of a single node. Thus limiting intake/insert scalability.

As for that slide, I am the originator of it, and the reasons above
are why I suggested as below.

On Mon, Feb 15, 2010 at 4:45 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Hello,
>
> I've seen the following in a few HBase presentations now:
>
> * What to store in HBase?
> * Maybe not your raw log data...
> * ...but the results of processing it with Hadoop
>
> e.g. slides 26 & 27: http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install
>
>
> Is there anything wrong in storing raw log data directly into HBase and doing so in real-time,
even when that means having to insert a few hundred rows/second?
>
> Is the above advice purely because of data volume associated with storing lots of raw
logs or some other reason?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>

Mime
View raw message