Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 98409 invoked from network); 3 Jun 2010 05:52:30 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Jun 2010 05:52:30 -0000 Received: (qmail 98931 invoked by uid 500); 3 Jun 2010 05:52:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 98851 invoked by uid 500); 3 Jun 2010 05:52:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 98843 invoked by uid 99); 3 Jun 2010 05:52:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jun 2010 05:52:28 +0000 X-ASF-Spam-Status: No, hits=-0.8 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.190.38.62] (HELO web50308.mail.re2.yahoo.com) (206.190.38.62) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 03 Jun 2010 05:52:24 +0000 Received: (qmail 26866 invoked by uid 60001); 3 Jun 2010 05:52:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1275544322; bh=P+U7GDROg1UWKmEia/Wbm9PzTb3deRXL9z0NtfwL2sY=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=qW/3guZxA3PU+G4ImqKF9JwOZTw6WhfFR81IEGMGikCAAYYLchpqbBVqgn1v4wvMLDi/CSVdmFp/FMAqM/lLKx4caIYx9mYrU5N5115ja+uJyGLdqAWlJ+yrsCqSMiDMGDRsKPZVmZW9AnXMJSt7HoynLZv4BjEQAiX9m+YODcY= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=a2DwL/Luq8csvWURoKVN8bMTcUGesVx6YaVRwaiQvj9VrLdXu7jIVulQN95BkLQlmmm8SRDEmu/6/pibG+cAHW+y0CReTdvqT2+iFkCBM1co38j6sCwlYJwTaJY+hNbJnHjdsBWbCiuLch7XcSA6kuQdvn6WlY1luD20dr/2L0k=; Message-ID: <918904.26664.qm@web50308.mail.re2.yahoo.com> X-YMail-OSG: Rn6Gq5YVM1nAna51WBzuHgQyZxcMDnSAnOHFCQ8gDtjUI2o niyJjqFfZTvh5GLzj43nth3lowZZ_PHd5gTuMtGJCkew1CRjRzyNAi2diYCQ g2nQMpQTHG7bCUrAkVsrH68A9eicwEOzsvT2f8TRBRQm7.SqEOsVcvTgLxhp F3rjbblnnZbbKk9yScW9kTqzw0bhitYUbFR6yyfvbyJGLlRKVMy8yvxD7zDs 18iXbS5ttythzT10zXnypow8DvZu6WeXgPnSHt0IW4NskSq7wm6YnmaUgQiD hkf.fw14nkJDz5n.MW_0gbXSt9CPz6mjSbB6FVlK5WyfYOgA- Received: from [74.73.6.27] by web50308.mail.re2.yahoo.com via HTTP; Wed, 02 Jun 2010 22:52:02 PDT X-Mailer: YahooMailRC/374.4 YahooMailWebService/0.8.103.269680 References: <4BFB0149.2090605@blastro.com> Date: Wed, 2 Jun 2010 22:52:02 -0700 (PDT) From: Otis Gospodnetic Subject: Re: Using HBase for logging To: user@hbase.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Hi Viktors, I noticed you mentioned the following two things: > - several column families on one date/time are useful > - and different tables for different level of aggregation (hour, date, week, month, year) Could you please explain: - why multiple CFs on one date/time are good (better than 1)? - why store different levels of aggregation to separate tables instead of just 1 table? Thanks Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Viktors Rotanovs > To: user@hbase.apache.org > Sent: Mon, May 24, 2010 7:32:26 PM > Subject: Re: Using HBase for logging > > I'm using HBase for similar stats, some things I've learned: - date/time as > key is good because that way it's very easy to get last N results (for a > chart, for example), and it's much more scalable than timestamps - > several column families on one date/time are useful - and different > tables for different level of aggregation (hour, date, week, month, year) > - you can increment long values when you need to know total: > href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue" > target=_blank > >http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[], byte[], > byte[], long) - MR jobs are a good and scalable way of processing this type > of data - data size is unlimited, so it's fine to write to multiple > tables - optimize for reads you're going to make, not for writes. To > import some of our logs, I'm using a java program which is called via > logrotate every 10 minutes (but be careful with that one, because if hbase > client freezes like happened to me after 0.20.4 upgrade, memory can get > filled very quickly). There's also a Python project for analytical data: > > >http://github.com/zohmg/zohmg Hope that helps, -- > Viktors On Tue, May 25, 2010 at 12:44 AM, Alex Thurlow < > ymailto="mailto:alex@blastro.com" > href="mailto:alex@blastro.com">alex@blastro.com> wrote: > Hi > list, > With HBase's great write speed, I was thinking it would be a > good thing > to switch an app that logs to a database to logging to HBase. > I couldn't > really find anyone else who's using it that way though. Are > there reasons I > shouldn't? If I should, how should I structure my > data? > > It's basically going to be data for an ad server, so the > relevant stuff > would be the timestamp, the id of the ad placement, and > the id of the > creative that showed. Some other data would be stored, > but I wouldn't need > to search on it. > > I would be wanting > to make reports out of that data by date, date/placement > id, > date/creative id, date/placementid/creativeid > > Should I just log > with the timestamp as the key and then pull the whole > range and filter > when I need the data or should I log everything three times > so I can > pull by whichever key I need? > > I'm fairly new to HBase, although > I've used Cassandra some, so I have an > idea of how this kind of works. > I just can't quite get my head around the > right way to use it for this > purpose. > > Thanks, > > -Alex > > -- > target=_blank >http://rotanovs.com - personal blog | > href="http://www.hitgeist.com" target=_blank >http://www.hitgeist.com > - fastest growing websites