kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Moisés Català <moises.cat...@lacupulamusic.com>
Subject Re: Kylin with S3, cubes tables get in transition when new cluster booted
Date Thu, 02 Aug 2018 08:35:13 GMT
Thanks ShaoFeng Shi,

As recommended in the installation tutorial, I use HDFS for intermediate data storage, so
before shutting down the cluster I back up hdfs://user/kylin <hdfs://user/kylin> in
S3 with dist-cp.

I have 2 buckets and I don’t do any modifications to S3 hbase root directly,these are my
buckets  :

Configuration Bucket in s3://xxxx-config/metadata/kylin <s3://xxxx-config/metadata/kylin>
where I store The contents of hdfs:///user/Kylin
HBASE rootdir in s3://xxxx-hbase/storage <s3://xxxx-hbase/storage>

When I shut down the cluster I execute these commands in a shutdown script:

#!/bin/bash
#stop kylin
$KYLIN_HOME/bin/kylin.sh stop

#To shut down an Amazon EMR cluster without losing data that hasn’t been written to Amazon
S3, 
#the MemStore cache needs to flush to Amazon S3 to write new store files. 
#To do this, you can run a shell script provided on the EMR cluster.

bash /usr/lib/hbase/bin/disable_all_tables.sh
 

#Before you shutdown/restart the cluster, you must backup the “/kylin” data on HDFS to
S3 with S3DistCp,
# or you may lost data and couldn’t recover the cluster later.

s3-dist-cp --src=hdfs:///kylin --dest=s3://da-config/metadata/kylin


s3-dist-cp creates a hadoop Job, so it will be monitored by consistent view in EMRFS.

So should I add these commands to my shutdown script?:

emrfs delete s3://x <s3://x>xxx-config/metadata/kylin 
emrfs import s3://x <s3://x>xxx-config/metadata/kylin 
emrfs sync s3://x <s3://x>xxx-config/metadata/kylin 

emrfs delete s3://x <s3://x>xxx-hbase/storage
emrfs import s3://x <s3://x>xxx-hbase/storage
emrfs sync s3://x <s3://x>xxx-hbase/storage

Should I do something in the hbase root directory in S3?

When I start a brand new cluster, apart of doing: 

hadoop fs -mkdir /kylin 
s3-dist-cp --src=s3://xxxx-config/metadata/kylin  --dest=hdfs:///kylin 

do I have to do any other action?

Thank you very much for your help, 

A final question, ¿Is it worth using S3 as hbase storage for a production environment o it
would be safer using just HDFS? My plan is to use Hive + Kylin as EDW
 



Moisés Català
Senior Data Engineer
La Cupula Music - Sonosuite
T: +34 93 250 38 05
www.lacupulamusic.com <http://www.lacupulamusic.com/>


> El 1 ago 2018, a las 3:10, ShaoFeng Shi <shaofengshi@apache.org> escribió:
> 
> Hi,
> 
> Sometimes the EMRFS becomes inconsistent with S3; EMRFS uses a DynamoDB to cache the
object entries and status. If you or your applications directly update S3 (not via EMRFS),
then the entries in EMRFS are inconsistent.
> 
> You can refer to this post: https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working
<https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working>
> 
> In my experience, I did this one or two times:
> 
> emrfs delete s3://path
> emrfs import s3://path
> emrfs sync s3://path
> 
> The key point is, when using EMRFS, all update to the bucket should go through EMRFS,
not S3. Hope this can help.
> 
> 2018-07-30 23:26 GMT+08:00 Moisés Català <moises.catala@lacupulamusic.com <mailto:moises.catala@lacupulamusic.com>>:
> Thanks for the tips Roberto,
> 
> You’re right, when I deploy emr and install Kylie everything works like a charm, even
I can build the sample cube.
> 
> I have added the config you suggested about using EMRFS in emrfs-site and I have launched
a brand new cluster.
> I also deployed Kylie and built the cube. Finally I shut down Kylin & disabled all
hbase tables.
> 
> Unfortunately, when I launch a new cluster, hbase master node can’t boot, looking the
log  appears this:
> 
> 2018-07-30 15:00:31,103 ERROR [ip-172-31-85-0:16000.activeMasterManager] consistency.ConsistencyCheckerS3FileSystem:
No s3 object for metadata item /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 2018-07-30 15:02:49,220 ERROR [ip-172-31-85-0:16000.activeMasterManager] consistency.ConsistencyCheckerS3FileSystem:
No s3 object for metadata item /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 2018-07-30 15:09:01,324 ERROR [ip-172-31-85-0:16000.activeMasterManager] consistency.ConsistencyCheckerS3FileSystem:
No s3 object for metadata item /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 2018-07-30 15:09:01,325 FATAL [ip-172-31-85-0:16000.activeMasterManager] master.HMaster:
Failed to become active master
> com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: 1 items inconsistent
(no s3 object for associated metadata item). First object: /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:749)
> 	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:519)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy30.listStatus(Unknown Source)
> 	at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.listStatus(S3NativeFileSystem2.java:206)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1532)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1558)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1603)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1597)
> 	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:347)
> 	at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1737)
> 	at org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:377)
> 	at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:358)
> 	at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:339)
> 	at org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:59)
> 	at org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45)
> 	at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:526)
> 	at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166)
> 	at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:141)
> 	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:725)
> 	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:198)
> 	at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1907)
> 	at java.lang.Thread.run(Thread.java:748)
> 2018-07-30 15:09:01,326 FATAL [ip-172-31-85-0:16000.activeMasterManager] master.HMaster:
Unhandled exception. Starting shutdown.
> 
> I have attached the full log to the email.
> 
> What am I missing???
> 
> Thanks in advance
> 
> 
> 
> 
> 
> 
> 
>> El 30 jul 2018, a las 9:02, <roberto.tardio@stratebi.com <mailto:roberto.tardio@stratebi.com>>
<roberto.tardio@stratebi.com <mailto:roberto.tardio@stratebi.com>> escribió:
>> 
>> Hi Moisés,
>>  
>> If I have understood right you have been able to deploy Kylin on EMR successfully
. However you lose metadata when you terminate the cluster. is it right? 
>>  
>> Have you tried to restore Kylin metadata backup after cluster re-creation? Moreover,
do you enable all HBase tables after cluster re-creation? 
>>  
>> We successfully deployed Kylin on EMR using S3 as storage for HBase and Hive. But
our configuration differ by 2 points:
>> ·         We use EMRFS https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html
<https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html>
>> o   {
>> o                  "Classification": "emrfs-site",
>> o                  "Properties": {
>> o                                  "fs.s3.consistent.retryPeriodSeconds": "10",
>> o                                  "fs.s3.consistent": "true",
>> o                                  "fs.s3.consistent.retryCount": "5",
>> o                                  "fs.s3.consistent.metadata.tableName": "EmrFSMetadata"
>> o                  },
>> o                  "Configurations": []
>> o   }
>> ·         We deployed Kylin on an EC2 machine separated from the cluster.
>>  
>> I hope this helps you.
>>  
>> Roberto Tardío
>>  
>> From: Moisés Català [mailto:moises.catala@lacupulamusic.com <mailto:moises.catala@lacupulamusic.com>]

>> Sent: sábado, 28 de julio de 2018 16:17
>> To: user@kylin.apache.org <mailto:user@kylin.apache.org>
>> Subject: Kylin with S3, cubes tables get in transition when new cluster booted
>>  
>> Hi all,
>>  
>> I’ve followed carefully the instructions provided in http://kylin.apache.org/docs23/install/kylin_aws_emr.html
<http://kylin.apache.org/docs23/install/kylin_aws_emr.html> 
>>  
>> My idea is to use s3 as the storage for Hbase, I have configured the cluster following
the instructions but I get that tables that contain cube definition keep "on transition" when
deploying new cluster and Kylie metadata seems outdated...
>>  
>> These are the steps I follow to create the cluster
>>  
>> Cluster creation command:
>>  
>> aws emr create-cluster \
>> --applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia Name=Hive
Name=Hbase Name=HCatalog Name=Tez \
>> --tags 'hive=' 'spark=' 'zeppelin=' \
>> --ec2-attributes 'file://../config/ec2-attributes.json <>' \
>> --release-label emr-5.16.0 \
>> --log-uri 's3n://sns-da-logs/ <>' \
>> --instance-groups 'file://../config/instance-hive-datawarehouse.json <>' \
>> --configurations  'file://../config/hive-hbase-s3.json <>' \
>> --auto-scaling-role EMR_AutoScaling_DefaultRole \
>> --ebs-root-volume-size 10 \
>> --service-role EMR_DefaultRole \
>> --enable-debugging \
>> --name 'hbase-hive-datawarehouse' \
>> --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
>> --region us-east-1
>>  
>>  
>> My configuration hive-hbase-s3.json:
>>  
>> [
>>   {
>>     "Classification": "hive-site",
>>     "Configurations": [],
>>     "Properties": {
>>       "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db <>",
>>       "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
>>       "javax.jdo.option.ConnectionPassword": “xxxxx",
>>       "javax.jdo.option.ConnectionURL": "jdbc:mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true
<>",
>>       "javax.jdo.option.ConnectionUserName": “xxxx"
>>     }
>>   },
>>   {
>>     "Classification": "hbase",
>>     "Configurations": [],
>>     "Properties": {
>>       "hbase.emr.storageMode": "s3"
>>     }
>>   },
>>   {
>>     "Classification": "hbase-site",
>>     "Configurations": [],
>>     "Properties": {
>>       "hbase.rpc.timeout": "3600000",
>>       "hbase.rootdir": "s3://xxxxxx-hbase/ <>"
>>     }
>>   },
>>   {
>>       "Classification": "core-site",
>>       "Properties": {
>>         "io.file.buffer.size": "65536"
>>       }
>>   },
>>   {
>>       "Classification": "mapred-site",
>>       "Properties": {
>>         "mapred.map.tasks.speculative.execution": "false",
>>         "mapred.reduce.tasks.speculative.execution": "false",
>>         "mapreduce.map.speculative": "false",
>>         "mapreduce.reduce.speculative": "false"
>>  
>>       }
>>   } 
>> ]
>>  
>> When I shut down the cluster I perform these commands:
>>  
>> ../kylin_home/bin/kylin.sh stop
>>  
>>  
>> #Before you shutdown/restart the cluster, you must backup the “/kylin” data on
HDFS to S3 with S3DistCp,
>>   
>> aws s3 rm s3://xxxxxx-config/metadata/kylin/* <>
>> s3-dist-cp --src=hdfs:///kylin <> --dest=s3://xxxxxx-config/metadata/kylin
<>
>>  
>> bash /usr/lib/hbase/bin/disable_all_tables.sh
>>  
>>  
>> Please, could you be so kind to indicate me what am I missing
>>  
>>  
>> Thanks in advance
> 
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
> 


Mime
View raw message