kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Tardío <roberto.tar...@stratebi.com>
Subject Re: Issues with Kylin with EMR and S3
Date Thu, 09 Nov 2017 09:38:40 GMT
Hi,

EMR version is 5.7 and Kylin version is 2.1. We have changed 
kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not 
changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could 
it be because we did not change this *kylin.storage.hbase.cluster-fs 
*parameter to S3?

We have tried also with the last versión of Kylin (2.2). In this case 
when build job start the first step get stucked with no errors or warns 
in log files. Maybe we are doing something wrong. We are going to try 
tomorrow setting *kylin.storage.hbase.cluster-fs *to S3.
**

Others details about abour our architecture are:

  * Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with
    Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
  * EMR 5.7 cluster (1 master and 4 cores)
      o HBase on S3
      o Hive warehouse on S3 and metastore configured on MySQL in the
        ec2 machine (the same where Kylin runs)
      o HDFS
      o S3 with EMRFS
      o Zookeeper.

I will give you feedback about tomorrow new tests.

Many thanks ShaoFeng!


El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:
> Hi Roberto,
>
> What's your EMR version? I know that in 4.x version, EMR's Hive has a 
> problem with "insert overwrite" over S3, that is just what Kylin need 
> in the "redistribute flat hive table" step. You can also skip the 
> "redistribute" step by setting 
> "kylin.source.hive.redistribute-flat-table=false" in kylin.properties. 
>  (On EMR 5.7, there is no such issue).
>
> The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, 
> and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also 
> on S3). Kylin will build the cube on HDFS and then output HFile to S3, 
> and finally load to HBase on S3. This will gain better build 
> performance and also ensure Cube data in S3 for high availability and 
> durability. But if you stop EMR, the intermediate cuboid files will be 
> lost, which cause segments couldn't be merged.
>
> The third option is to use a newer version like EMR 5.7, use S3 as the 
> working dir (and HBase also on S3).
>
> For all the scenarios, please use Kylin v2.2, which includes the fix 
> of KYLIN-2788.
>
>
>
>
> 2017-11-09 3:45 GMT+08:00 Roberto Tardío <roberto.tardio@stratebi.com 
> <mailto:roberto.tardio@stratebi.com>>:
>
>     Hi,
>
>     We have deployed Kylin on ec2 machine using an EMR cluster. After
>     adding the "hbase.zookeeper.quorum" property to
>     kylin_job_conf.xml, we have succesfully build sample cube.
>     However, kylin data is stored on hdfs path /kylin. Due to the HDFS
>     is ephemeral storage on EMR and it will be erased if you Terminate
>     the cluster (e.g. to save costs of use, to change the kind of
>     instances,...), we have to store data on S3.
>
>     With this aim we changed 'kylin.env.hdfs-working-dir' property to
>     s3, like s3://your-bucket/kylin. But after this change if we try
>     to build sample cube, the build job starts but it gets stuck in
>     step 2 "Redistribute Flat Hive Table". We have checked that this
>     step never start and kylin logs do not show any error or warn.
>
>     Do you have any idea how to solve this and make possible that
>     Kylin works with S3?
>
>     So far the only solution we have found is to copy the HDFS folder
>     to S3 before terminate the EMR cluster and copy it from S3 to HDFS
>     when it is turned on. However this is a half solution, since the
>     HDFS storage of EMR is ephemeral and we do not have as much space
>     available as in S3. Which data stores kylin on kylin path? HBase
>     tables are stored in this folder?
>
>     We will appreciate you help,
>
>     Roberto
>
>     -- 
>
>     *Roberto Tardío Olmos*
>
>     /Senior Big Data & Business Intelligence Consultant/
>     Avenida de Brasil, 17
>     <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
>     Planta 16.28020 Madrid
>     Fijo: 91.788.34.10
>
>
>
>
> -- 
> Best regards,
>
> Shaofeng Shi 史少锋
>

-- 

*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17, Planta 16.28020 Madrid
Fijo: 91.788.34.10

Mime
View raw message