hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranov <alex.barano...@gmail.com>
Subject Re: Using HBase for logging
Date Thu, 03 Jun 2010 12:41:04 GMT
Hi,

If you're not going access this logged data as is from HBase and will use it
to produce some statistics, etc. (perhaps via MR jobs), then I wouldn't
recommend using neither date/time nor timestamp as a row key. This will make
HBase write logs into particular RegionServer at one time and thus at one
time only one box will have hot load wereas others will just wait their
turn. I'd suggest using some random value for the key like UUID.

Alex Baranau

http://sematext.com
http://en.wordpress.com/tag/hadoop-ecosystem-digest/
http://search-hadoop.com - Search Hadoop, HDFS, MapReduce, HBase, and other
related projects.

On Tue, May 25, 2010 at 2:32 AM, Viktors Rotanovs <
viktors.rotanovs@gmail.com> wrote:

> I'm using HBase for similar stats, some things I've learned:
>  - date/time as key is good because that way it's very easy to get
> last N results (for a chart, for example), and it's much more scalable
> than timestamps
>  - several column families on one date/time are useful
>   - and different tables for different level of aggregation (hour,
> date, week, month, year)
>  - you can increment long values when you need to know total:
>
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[]<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte%5B%5D>
> ,
> byte[], byte[], long)
>  - MR jobs are a good and scalable way of processing this type of data
>  - data size is unlimited, so it's fine to write to multiple tables
>  - optimize for reads you're going to make, not for writes.
> To import some of our logs, I'm using a java program which is called
> via logrotate every 10 minutes (but be careful with that one, because
> if hbase client freezes like happened to me after 0.20.4 upgrade,
> memory can get filled very quickly).
>
> There's also a Python project for analytical data:
> http://github.com/zohmg/zohmg
>
> Hope that helps,
> -- Viktors
>
> On Tue, May 25, 2010 at 12:44 AM, Alex Thurlow <alex@blastro.com> wrote:
> > Hi list,
> >    With HBase's great write speed, I was thinking it would be a good
> thing
> > to switch an app that logs to a database to logging to HBase.  I couldn't
> > really find anyone else who's using it that way though.  Are there
> reasons I
> > shouldn't?  If I should, how should I structure my data?
> >
> > It's basically going to be data for an ad server, so the relevant stuff
> > would be the timestamp, the id of the ad placement, and the id of the
> > creative that showed.  Some other data would be stored, but I wouldn't
> need
> > to search on it.
> >
> > I would be wanting to make reports out of that data by date,
> date/placement
> > id, date/creative id, date/placementid/creativeid
> >
> > Should I just log with the timestamp as the key and then pull the whole
> > range and filter when I need the data or should I log everything three
> times
> > so I can pull by whichever key I need?
> >
> > I'm fairly new to HBase, although I've used Cassandra some, so I have an
> > idea of how this kind of works.  I just can't quite get my head around
> the
> > right way to use it for this purpose.
> >
> > Thanks,
> >    -Alex
> >
> >
>
>
>
> --
> http://rotanovs.com - personal blog | http://www.hitgeist.com -
> fastest growing websites
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message