# kylin-dev mailing list archives

##### Site index · List index
Message view
Top
From Abhishek Sinha <abhis...@infoworks.io>
Subject Re: Fact Table Distinct columns and Row Key
Date Tue, 17 Mar 2015 09:14:46 GMT
```Thanks. Good one :)

On Tue, Mar 17, 2015 at 11:52 AM, hongbin ma <mahongbin@apache.org> wrote:

> it is quite a neat explanation of RowKey:)
>
> On Mon, Mar 16, 2015 at 11:15 PM, Shi, Shaofeng <shaoshi@ebay.com> wrote:
>
> > Piece of my knowledge on Kylin:
> >
> > On 3/17/15, 1:38 PM, "Abhishek Sinha" <abhisheksinha1911@gmail.com>
> wrote:
> >
> > >Hi,
> > >
> > >Can anyone explain the two steps in the cube build process?
> > >
> > >1. Why do we need to extract the distinct columns from Fact Table or
> > >calculate the HIVE table cardinality?
> >
> > Kylin builds dictionary for each column, it needs to fetch the distinct
> > values for each column; Using dictionary will greatly reduce the storage
> > size;
> > The cardinality can optimize the row key sequence, and so to determine
> the
> > roadmap of cube building, which will help 1) reduce the cube building
> time
> > 2) reduce the cube scan range so to improve query performance
> >
> > >
> > >2. What is the use of RowKey? How is it calculated? How does it help in
> > >calculating HTable Region splits?
> >
> > RowKey is the key in Kylin¹s storage (Hbase); It is composed by the
> > dimensions¹ values (encoded in bytes); Assume your table has dimension
> > columns A, B, C; Their cardinality is n1, n2, n3; In the base cuboid,
> > there will be n1*n2*n3 rows; each row¹s key is A+B+C (concat of encoded
> > bytes); When user sends a query like ³select Š from fact group by A, B, C
> > where A=XX and B=YY and C=ZZ², Kylin will use encode(XX) + encode(YY) +
> > encode(ZZ) as the key to query hbase to get the pre-aggregated result;
> > >
> > >
> > >Is there any documentation available on these? Or any research
> paper/book
> > >referred during the project?
> > Check the docs here, especially the "Design Cube in Kylin.pdf" :
> > https://github.com/KylinOLAP/Kylin/tree/master/docs
> >
> > >
> >
> >
>

--
Abhishek Sinha
Mobile: +919035191078
infoworks.io

```
Mime
• Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message