kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Issues with Kylin with EMR and S3
Date Fri, 10 Nov 2017 12:42:57 GMT
Hi Roberto,

Today I tested Kylin 2.2 with EMR 5.7 (HBase on S3), and setting Kylin's
working dir as S3; Basically, it works good, no blocking issue. Only two
minor issues found and I recorded them below:

https://issues.apache.org/jira/browse/KYLIN-3028
https://issues.apache.org/jira/browse/KYLIN-3032

Didn't see job blocking; My cluster is very small, 1 master and 1 core
(m3.xlarge). Kylin is installed in the master node, run with root. Please
double check your environment. If have any new findings, please share with
the community. Thanks.


​[image: 内嵌图片 1]

2017-11-09 21:56 GMT+08:00 ShaoFeng Shi <shaofengshi@apache.org>:

> Thanks Roberto;
>
> I will also try that on tomorrow or this weekend; I had planned to draft a
> document for EMR, it's time to do that now.
>
> 2017-11-09 19:54 GMT+08:00 Roberto Tardío <roberto.tardio@stratebi.com>:
>
>> Hi,
>>
>> With Kylin 2.1 YARN RM shows one JOB for Step1 was finished with
>> successful. But there is no job when step2 get stucked. When we use HDFS as
>> working dir this steps works fine and launch a Tez job on YARN RM that
>> finish with success (and also all the sample cube build process).
>>
>> With Kylin 2.2 YARN RM do not show any MR job when Step 1 get stucked.
>>
>> However we are going to do again the test, maybe due to change kylin
>> version from 2.1 to 2.2 we forget to clean some metadata, coprocessor,...
>>
>> El 09/11/2017 a las 11:10, ShaoFeng Shi escribió:
>>
>> Hi Robert,
>>
>> No need to set
>> *kylin.storage.hbase.cluster-fs to the same bucket again. *
>>
>> For the stuck job, did you check YARN RM to see whether there is any
>> indicator?
>>
>>
>> 2017-11-09 17:38 GMT+08:00 Roberto Tardío <roberto.tardio@stratebi.com>:
>>
>>> Hi,
>>>
>>> EMR version is 5.7 and Kylin version is 2.1. We have changed
>>> kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not
>>> changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could
>>> it be because we did not change this *kylin.storage.hbase.cluster-fs *parameter
>>> to S3?
>>>
>>> We have tried also with the last versión of Kylin (2.2). In this case
>>> when build job start the first step get stucked with no errors or warns in
>>> log files. Maybe we are doing something wrong. We are going to try tomorrow
>>> setting *kylin.storage.hbase.cluster-fs *to S3.
>>>
>>> Others details about abour our architecture are:
>>>
>>>    - Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with
>>>    Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
>>>    - EMR 5.7 cluster (1 master and 4 cores)
>>>    - HBase on S3
>>>       - Hive warehouse on S3 and metastore configured on MySQL in the
>>>       ec2 machine (the same where Kylin runs)
>>>       - HDFS
>>>       - S3 with EMRFS
>>>       - Zookeeper.
>>>
>>> I will give you feedback about tomorrow new tests.
>>>
>>> Many thanks ShaoFeng!
>>>
>>> El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:
>>>
>>> Hi Roberto,
>>>
>>> What's your EMR version? I know that in 4.x version, EMR's Hive has a
>>> problem with "insert overwrite" over S3, that is just what Kylin need in
>>> the "redistribute flat hive table" step. You can also skip the
>>> "redistribute" step by setting "kylin.source.hive.redistribut
>>> e-flat-table=false" in kylin.properties.  (On EMR 5.7, there is no such
>>> issue).
>>>
>>> The second option is, set "kylin.env.hdfs-working-dir" to local HDFS,
>>> and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on
>>> S3). Kylin will build the cube on HDFS and then output HFile to S3, and
>>> finally load to HBase on S3. This will gain better build performance and
>>> also ensure Cube data in S3 for high availability and durability. But if
>>> you stop EMR, the intermediate cuboid files will be lost, which cause
>>> segments couldn't be merged.
>>>
>>> The third option is to use a newer version like EMR 5.7,  use S3 as the
>>> working dir (and HBase also on S3).
>>>
>>> For all the scenarios, please use Kylin v2.2, which includes the fix of
>>> KYLIN-2788.
>>>
>>>
>>>
>>>
>>>
>>> 2017-11-09 3:45 GMT+08:00 Roberto Tardío <roberto.tardio@stratebi.com>:
>>>
>>>> Hi,
>>>>
>>>> We have deployed Kylin on ec2 machine using an EMR cluster. After
>>>> adding the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have
>>>> succesfully build sample cube. However, kylin data is stored on hdfs path
>>>> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased
>>>> if you Terminate the cluster (e.g. to save costs of use, to change the kind
>>>> of instances,...), we have to store data on S3.
>>>>
>>>> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3,
>>>> like s3://your-bucket/kylin. But after this change if we try to build
>>>> sample cube, the build job starts but it gets stuck in step 2 "Redistribute
>>>> Flat Hive Table". We have checked that this step never start and kylin logs
>>>> do not show any error or warn.
>>>>
>>>> Do you have any idea how to solve this and make possible that Kylin
>>>> works with S3?
>>>>
>>>> So far the only solution we have found is to copy the HDFS folder to S3
>>>> before terminate the EMR cluster and copy it from S3 to HDFS when it is
>>>> turned on. However this is a half solution, since the HDFS storage of EMR
>>>> is ephemeral and we do not have as much space available as in S3. Which
>>>> data stores kylin on kylin path? HBase tables are stored in this folder?
>>>>
>>>> We will appreciate you help,
>>>>
>>>> Roberto
>>>> --
>>>>
>>>> *Roberto Tardío Olmos*
>>>> *Senior Big Data & Business Intelligence Consultant*
>>>> Avenida de Brasil, 17
>>>> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
>>>> Planta 16.28020 Madrid
>>>> Fijo: 91.788.34.10
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>> --
>>>
>>> *Roberto Tardío Olmos*
>>> *Senior Big Data & Business Intelligence Consultant*
>>> Avenida de Brasil, 17
>>> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
>>> Planta 16.28020 Madrid
>>> Fijo: 91.788.34.10
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>> --
>>
>> *Roberto Tardío Olmos*
>> *Senior Big Data & Business Intelligence Consultant*
>> Avenida de Brasil, 17
>> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
>> Planta 16.28020 Madrid
>> Fijo: 91.788.34.10
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Mime
View raw message