hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: storing logs in hbase
Date Sun, 05 Feb 2012 19:07:00 GMT

... but it depends on what you want to do.  If you want full-text
searching, then yes, you probably want to look at Lucene.  If you want
activity analysis, summaries are probably better.





On 2/5/12 1:54 PM, "Doug Meil" <doug.meil@explorysmedical.com> wrote:

>
>Hi there-
>
>You probably want to check out these chapters of the Hbase ref guide:
>
>http://hbase.apache.org/book.html#datamodel
>http://hbase.apache.org/book.html#schema
>http://hbase.apache.org/book.html#mapreduce
>
>... and with respect to the "40 minutes per report", a common pattern is
>to create summary table/files as appropriate.
>
>
>
>
>On 2/5/12 3:37 AM, "mete" <efkarr@gmail.com> wrote:
>
>>Hello,
>>
>>i am thinking about using hbase for storing web log data, i like the idea
>>to have hdfs underneath so that i wont be worried about failure cases
>>much
>>and i can benefit from all the cool HBase features.
>>
>>The thing i could not figure out is howto effectively store and query the
>>data.I am planning to split each kind of log record to 10 - 20 columns
>>and
>>then use MR jobs query the table with full scans.
>>(I guess i can use hive or pig for this as well but i am not familiar
>>with
>>those yet)
>>I find this approach simple and easy to implement but on the other hand
>>this is like an offline process, it could take a lot of time to get a
>>single report. And of course a business user would be very dissappointed
>>to
>>see that he/she has to wait another 40 mins to get the results of the
>>query.
>>
>>So what i am trying to achieve is to keep this query time as small as
>>possible. For this i can sacrifice the write speed as well, i dont really
>>have to integrate new logs on-the-fly but a job that runs overnight is
>>also
>>fine.
>>
>>So for this kind of situation do you find Hbase useful?
>>
>>I read about star-schema design to make more effective queries but then
>>this makes the developers job a lot more harder because i need to design
>>different schemas for different log types, adding a new log type would
>>require some time to gather requirements,develop etc...
>>
>>I thought about creating a very simple hbase shema, like just a key and
>>the
>>content for each record, and then index this content with lucene, but
>>then
>>this sounded like i did not need hbase in the first place because i am
>>not
>>really benefiting from it except for storage.Also i could not be sure
>>about
>>how big my lucene indexes would get, and if i could cope up with bigdata
>>on
>>lucene. What do you think about lucene indexes on hbase?
>>
>>I read about how rackspace was doing things, as far as i understood they
>>are generating lucene indexes while parsing the logs in hadoop, and then
>>merging this index into some system that is serving the previous
>>indexes.(
>>http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-qu
>>e
>>ry-terabytes-data)
>>
>>Does anyone use a similar approach or have any ideas about this?
>>
>>Do you think any of these are suitable? or if not should i try a
>>different
>>way?
>>
>>Thanks in advance
>>Mete
>
>
>



Mime
View raw message