hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Marroquín Mogrovejo <>
Subject Re: Hive Usage
Date Mon, 19 Jul 2010 19:28:00 GMT
Hi Ashish,

I mean if there are like modeling best practices in order to obtain better
performance (buckets, partitions, tables related), e.g. maybe creating
different partitions considering not just time frames but maybe also
partition size, or for example in Hive's paper, the list partitioning that
the compiler uses to know where to look for the data, or I dunno those kind
of modeling related things.
Or is it just to choose between the well known Kimball or Inmon  approaches?
Thanks in advanced.

Renato M.

2010/7/16 Ashish Thusoo <>

>  Hi Renato,
> Can you expand more on what exactly you mean by modelling?
> On the append side, Hive does not really support appends though you can
> create a new partition within the table for every run and that could be used
> as a work around for appends.
> Ashish
>  ------------------------------
> *From:* Renato Marroquín Mogrovejo []
> *Sent:* Thursday, July 15, 2010 2:53 PM
> *To:*
> *Subject:* Hive Usage
> Hi there I would like to know if there is anyone who has done some kind of
> modelling on Hive, and is willing to share some experiences please.
> Today is my first day with Hive, and I have several doubts regarding to the
> modelling, if I would have to do a special modelling, or a regular DW one.ç
> And another thing I wanted to know is if Hive already has the append option
> enabled, because I know there is a hadoop branch with the append option
> enabled and also a cloudera release does (I think it is the CHD3).
> Please any kind of suggestion or opinions are highly appreciated.
> Renato M.

View raw message