hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wojciech Langiewicz <wlangiew...@gmail.com>
Subject Re: Hive for large statistics tables?
Date Tue, 27 Sep 2011 13:33:53 GMT
Hello,
I'm using Hive to query data like yours. In my case I have about 300 - 
500GB data per day, so it is much larger. We use Flume to load data into 
Hive - data is rolled every day (this can be changed).

Hive queries - ad-hoc or scheduled usually take at least 10-20s or more 
(possibly hours) - it won't speed up your processing. Hive shows it 
power when you reach more data than serveral GB per month.

I think, that in your case Hive is not a good solution, you'll be better 
off using more powerful MySQL servers.

On 27.09.2011 11:14, Benjamin Fonze wrote:
> Dear All,
>
> I'm new to this list, and I hope I'm sending this to the right place.
>
> I'm currently using MySQL to store a large amount of visitor statistics.
> (Visits, clicks, etc....)
>
> Basically, each visit is logged in a text file, and every 15 minutes, a job
> consolidate it into MySQL, into tables that looks like this :
>
> COUNTRY | DATE | USER_AGENT | REFERRER | SEARCH | ... | NUM_HITS
>
> This generates million of rows a month, and several GB of data. Then, when
> querying these tables, it would typically take a few seconds. (Yes, there
> are indexes, etc...)
>
> I was thinking to move all that data to a noSQL DB like Hive, but I want to
> make sure it is adapted to my purpose. Can you confirm that Hive is a good
> fit for such statistical data. More importantly, can you confirm that ad-hoc
> queries on that data will be much faster that MySQL?
>
> Thanks in advance!
>
> Benjamin.
>


Mime
View raw message