kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Yang <liy...@apache.org>
Subject Re: Choosing between Kylin and Lens
Date Mon, 02 Mar 2015 09:58:53 GMT
Answer from Kylin perspective. :-)

The same is there's no performance benchmark at the moment.

> Do you have some guidelines/recommendations on how to choose the right
solution?

Kylin's advantage is pre-calculation of join and aggregation. If your query
is at high aggregation level, or has many joins, Kylin will have an edge.
In addition, Kylin is of Hadoop family and has an ANSI SQL interface that
differentiate from some other solutions.

>     When using Hive as storage, it seems Kylin might perform better since
> data is pre-aggregated and cached.

Kylin uses HBase as storage of cube. Hive table is the input. Data is read
from Hive, build into cube with mapreduce, and stored in HBase. User write
queries against the origin Hive table and Kylin will answer from the cube
without accessing Hive at runtime.

> How does Kylin handle sparse tables and avoid empty cells in cache?

Data is encoded using dictionary and then stored in cube. So every value in
cube is a code point of minimal length, including empties.


Cheers
Yang


On Fri, Feb 27, 2015 at 3:15 PM, amareshwarisr . <amareshwari@gmail.com>
wrote:

> Hello Long Zhou,
>
> Thanks for reaching out. I'm developer at Lens and trying to answer your
> questions with respect to Lens.
>
> On Thu, Feb 26, 2015 at 9:09 PM, Long Zhou <longzhouwk@gmail.com> wrote:
>
> > [delivery to user@kylin failed, resend to dev@kylin]
> >
> > Hi Kylin and Lens communities,
> >
> >     I am working on a big data analysis project and consider using Kylin
> > or Lens. Do you have some guidelines/recommendations on how to choose the
> > right solution? We are particularly interested in the performance
> > characteristics of these two solutions on terabytes of sparse data.
> >
>
> We don't have guidelines/recommendations/performance characteristics
> documented anywhere as of now. But user documentation should help you with
> some details of the system. Lens itself does not have any overhead with
> respect to query execution, it would be given to underlying engine and the
> performance numbers published in underlying systems should be sufficient.
>
>
> >     I just started learning the two projects. It seems Kylin is more like
> > MOLAP while Lens is more like ROLAP, is that correct? Does the
> differences
> > between MOLAP and ROLAP apply here?
> >
>
> I  agree with Lens that it is ROLAP like system. We can say Lens can become
> HOLAP (http://en.wikipedia.org/wiki/ROLAP,
> http://en.wikipedia.org/wiki/HOLAP,
> http://www.1keydata.com/datawarehousing/molap-rolap.html). And as said in
> ROLAP, performance of Lens depends on underlying execution engines and if
> the data is not aggregated, it would pick detailed tables for answering.
> But if aggregated data is available through an ETL process, it would make
> use of it.
>
>     When using Hive as storage, it seems Kylin might perform better since
> > data is pre-aggregated and cached. How does Kylin handle sparse tables
> and
> > avoid empty cells in cache? Does Lens have cache on top of Hive?
> >
>
> No, Lens does not have any cache on top of Hive.
>
>
> >     Lens supports columnar data warehouses like Redshift. How much
> > performance could we gain by loading data to Redshift? Where can I find
> > performance benchmark data for the two projects?
> >
>
> It would be same as how fast Redshift can answer queries. Lens comes with
> JDBCDriver for reaching systems which can understand jdbc. At inmobi, we
> are using it with Columnar dataware house - InfoBright (
> https://www.infobright.com/) in production, it should work with Redshift
> as
> well, but it is not yet tested with RedShift.
>
> Thanks
> Amareshwari
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message