hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: Storing lots of raw log data in HBase
Date Tue, 16 Feb 2010 06:17:45 GMT
For our "LogSearch" product, we make a UUID for every log row when we
ingest it in a Mapper. Perfectly distributed, so it'll load evenly
across the cluster!



On Mon, Feb 15, 2010 at 4:52 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> Most log data tends to be time-oriented, thus the 'natural' schema is
> to use the timestamp as the row key, thus concentrating all inserts on
> a single region and thus node.  This is fixable by changing the key to
> something other than a monotonically increasing value.
>
> If you just insert on 1 region, you end up being gated by the
> performance of a single node. Thus limiting intake/insert scalability.
>
> As for that slide, I am the originator of it, and the reasons above
> are why I suggested as below.
>
> On Mon, Feb 15, 2010 at 4:45 PM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com> wrote:
>> Hello,
>>
>> I've seen the following in a few HBase presentations now:
>>
>> * What to store in HBase?
>> * Maybe not your raw log data...
>> * ...but the results of processing it with Hadoop
>>
>> e.g. slides 26 & 27: http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install
>>
>>
>> Is there anything wrong in storing raw log data directly into HBase and doing so
in real-time, even when that means having to insert a few hundred rows/second?
>>
>> Is the above advice purely because of data volume associated with storing lots of
raw logs or some other reason?
>>
>> Thanks,
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>
>



-- 
http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Mime
View raw message