hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: storing logs in hbase
Date Sun, 05 Feb 2012 19:07:00 GMT

... but it depends on what you want to do.  If you want full-text
searching, then yes, you probably want to look at Lucene.  If you want
activity analysis, summaries are probably better.

On 2/5/12 1:54 PM, "Doug Meil" <doug.meil@explorysmedical.com> wrote:

>Hi there-
>You probably want to check out these chapters of the Hbase ref guide:
>... and with respect to the "40 minutes per report", a common pattern is
>to create summary table/files as appropriate.
>On 2/5/12 3:37 AM, "mete" <efkarr@gmail.com> wrote:
>>i am thinking about using hbase for storing web log data, i like the idea
>>to have hdfs underneath so that i wont be worried about failure cases
>>and i can benefit from all the cool HBase features.
>>The thing i could not figure out is howto effectively store and query the
>>data.I am planning to split each kind of log record to 10 - 20 columns
>>then use MR jobs query the table with full scans.
>>(I guess i can use hive or pig for this as well but i am not familiar
>>those yet)
>>I find this approach simple and easy to implement but on the other hand
>>this is like an offline process, it could take a lot of time to get a
>>single report. And of course a business user would be very dissappointed
>>see that he/she has to wait another 40 mins to get the results of the
>>So what i am trying to achieve is to keep this query time as small as
>>possible. For this i can sacrifice the write speed as well, i dont really
>>have to integrate new logs on-the-fly but a job that runs overnight is
>>So for this kind of situation do you find Hbase useful?
>>I read about star-schema design to make more effective queries but then
>>this makes the developers job a lot more harder because i need to design
>>different schemas for different log types, adding a new log type would
>>require some time to gather requirements,develop etc...
>>I thought about creating a very simple hbase shema, like just a key and
>>content for each record, and then index this content with lucene, but
>>this sounded like i did not need hbase in the first place because i am
>>really benefiting from it except for storage.Also i could not be sure
>>how big my lucene indexes would get, and if i could cope up with bigdata
>>lucene. What do you think about lucene indexes on hbase?
>>I read about how rackspace was doing things, as far as i understood they
>>are generating lucene indexes while parsing the logs in hadoop, and then
>>merging this index into some system that is serving the previous
>>Does anyone use a similar approach or have any ideas about this?
>>Do you think any of these are suitable? or if not should i try a
>>Thanks in advance

View raw message