hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Dimensional Data Model on Hive
Date Thu, 10 May 2012 13:53:51 GMT
On Thu, May 10, 2012 at 9:26 AM, Kuldeep Chitrakar
<kuldeep.chitrakar@synechron.com> wrote:
> Hi
>
>
>
> I have data warehouse implementation for Click Stream data analysis on
> RDBMS. Its a start schema (Dimensions and Facts).
>
>
>
> Now if i want to move to Hive, Do i need to create same data model as
> Dimensions and facts and join them.
>
>
>
> I should create a big de-normalized table which contains all textual
> attributes from all dimensions. If so how do we handle SCD 2 type dimensions
> in Hive.
>
>
>
> Its very basic question but I am just confused on this.
>
>
>
>
>
> Thanks,
>
> Kuldeep

While hive is sometimes referred to as a data warehouse you usually
want to avoid data warehouse concepts like stat-schema. There are a
number of reasons for this:
1) No unique constraints
2) limited index capabilities
3) Map side joins are optimal when a single table is small
4) Most join types while generalize into map reduce are much different
then a join in single node databases

I'm most situations I advice going the "nosql route" and de-normalize
almost everything. Optimize for scanning.

Mime
View raw message