hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: indexing log files for adhoc queries - suggestions?
Date Fri, 02 Oct 2009 22:28:51 GMT
Hive is an sql-like abstraction over map reduce. It just enables you
to execute sql-like queries over data without actually having to write
the MR job. However it converts the query into a job at the back.

Hbase might be what you are looking for. You can put your logs into
hbase and query them as well as run MR jobs over them...

On 10/1/09, Mayuran Yogarajah <mayuran.yogarajah@casalemedia.com> wrote:
> ishwar ramani wrote:
>> Hi,
>>
>> I have a setup where logs are periodically bundled up and dumped into
>> hadoop dfs as large sequence file.
>>
>> It works fine for all my map reduce jobs.
>>
>> Now i need to handle adhoc queries for pulling out logs based on user
>> and time range.
>>
>> I really dont need a full indexer (like lucene) for this purpose.
>>
>> My first thought is to run a periodic mapreduce to generate a large
>> text file sorted by user id.
>>
>> The text file will have (sequence file name, offset) to retrieve the logs
>> ....
>>
>>
>> I am guessing many of you ran into similar requirements... Any
>> suggestions on doing this better?
>>
>> ishwar
>>
> Have you looked into Hive? Its perfect for ad hoc queries..
>
> M
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Mime
View raw message