hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wd>
Subject Re: Best practices for storing data on Hive
Date Sun, 04 Sep 2011 08:01:52 GMT
Hive support more than one partitions, have your tried? Maybe you can
create to partitions named as date and user.

Hive 0.7 also support index, maybe you can have a try.

On Sat, Sep 3, 2011 at 1:18 AM, Mark Grover <> wrote:
> Hello folks,
> I am fairly new to Hive and am wondering if you could share some of the best practices
for storing/querying data with Hive.
> Here is an example of the problem I am trying to solve.
> The traffic to our website is logged in files that contain information about clicks from
various users.
> Simplified, the log file looks like:
> t_1, ip_1, userid_1
> t_2, ip_2, userid_2
> t_3, ip_3, userid_3
> ...
> where t_i represents time of the click, ip_i represents ip address where the click originated
from, and userid_i represents the user ID of the user.
> Since the clicks are logged on an ongoing basis, partitioning our Hive table by day seemed
like the obvious choice. Every night we upload the data from the previous day into a new partition.
> However, we would also want the capability to find all log lines corresponding to a particular
user. With our present partitioning scheme, all day partitions are searched for that user
ID but this takes a long time. I am looking for ideas/suggestions/thoughts/comments on how
to reduce this time.
> As a solution, I am thinking that perhaps we could have 2 independent tables, one which
stores data partitioned by day and the other partitioned by userId. With the second table
partitioned by userId, I will have to find some way of maintaining the partitions since Hive
doesn't support appending of files. Also, this seems suboptimal, since we are doubling that
the amount of data that we store. What do you folks think of this idea?
> Do you have any other suggestions on how we can approach this problem?
> What have other people in similar situations done? Please share.
> Thank you in advance!
> Mark

View raw message