kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Tardío <roberto.tar...@stratebi.com>
Subject Issues with Kylin with EMR and S3
Date Wed, 08 Nov 2017 19:45:22 GMT
Hi,

We have deployed Kylin on ec2 machine using an EMR cluster. After adding 
the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have 
succesfully build sample cube. However, kylin data is stored on hdfs 
path /kylin. Due to the HDFS is ephemeral storage on EMR and it will be 
erased if you Terminate the cluster (e.g. to save costs of use, to 
change the kind of instances,...), we have to store data on S3.

With this aim we changed 'kylin.env.hdfs-working-dir' property to s3, 
like s3://your-bucket/kylin. But after this change if we try to build 
sample cube, the build job starts but it gets stuck in step 2 
"Redistribute Flat Hive Table". We have checked that this step never 
start and kylin logs do not show any error or warn.

Do you have any idea how to solve this and make possible that Kylin 
works with S3?

So far the only solution we have found is to copy the HDFS folder to S3 
before terminate the EMR cluster and copy it from S3 to HDFS when it is 
turned on. However this is a half solution, since the HDFS storage of 
EMR is ephemeral and we do not have as much space available as in S3. 
Which data stores kylin on kylin path? HBase tables are stored in this 
folder?

We will appreciate you help,

Roberto

-- 

*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17, Planta 16.28020 Madrid
Fijo: 91.788.34.10

Mime
View raw message