kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoxiang Yu" <hit_la...@126.com>
Subject Re: Issue when recreating EMR cluster with HBase data on S3
Date Thu, 27 Jun 2019 10:50:31 GMT
Hi Andras,
   In fact, we currently have no way to backup or restore the streaming metadata which related
to replica set/assignment etc. 
   I think these metadata are volatile, such as hostname of each worker may be different in
two cluster. But if you find backup/restore is really useful for streaming metadata. Please
submit a JIRA.


-----------------
-----------------
Best wishes to you ! 
From :Xiaoxiang Yu

At 2019-06-27 17:54:08, "Andras Nagy" <andras.istvan.nagy@gmail.com> wrote:

OK, this worked, so I could proceed one step. I disabled all HBase tables, manually altered
them so the coprocessor locations point to the new HDFS cluster, and re-enabled them. After
this, there are no errors in the regionserver's logs, and Kylin starts up, so this seems fine.
(Interestingly, the DeployCoprocessorCLI did assemble the correct HDFS URL, but could not
alter the table definitions, so after running DeployCoprocessorCLI, the table definitions
have not changed. This is on HBase version 1.4.9.)


However when I try to query the existing cubes, I get a failure with a NullPointerException
at org.apache.kylin.stream.coordinator.assign.AssignmentsCache.getReplicaSetsByCube(AssignmentsCache.java:61).
Just quickly looking at it, it seems like these cube assignments come from Zookeeper, and
I'm missing them. Since I'm now running on a completely new EMR cluster (with new Zookeeper),
I wonder if there is some persistent state in Zookeeper that should also be backed up and
restored. 


(This deployment used hdfs-working-dir on HDFS, so before terminating the old cluster I backed
up the hdfs-working-dir and have restored it in the new cluster; but nothing from Zookeeper.)


Thanks in advance for any pointers about this.


On Thu, Jun 27, 2019 at 10:30 AM Andras Nagy <andras.istvan.nagy@gmail.com> wrote:

Checked the table definition in HBase, and that's what explicitely references the coprocessor
location on the old cluster. I'll update that and let you know.



On Thu, Jun 27, 2019 at 10:26 AM Andras Nagy <andras.istvan.nagy@gmail.com> wrote:

Actually as I noticed, it's not the corpocessor that's failing, but HBase when trying to load
the coprocessor itself from HDFS (form a reference somewhere that still points to the old
HDFS namenode).



On Thu, Jun 27, 2019 at 10:19 AM Andras Nagy <andras.istvan.nagy@gmail.com> wrote:

Hi ShaoFeng,


After disabling the "KYLIN_*" tables (but not 'kylin_metadata') the RegionServers could indeed
start up and the coprocessor refresh succeeded.


But after re-enabling those tables again, the issue continues, and again the RegionServers
fail by trying to connect to the old master node. What I noticed now from the stacktrace is
that the coprocessor is actually trying to connect to the old HDFS namenode on port 8020 (and
not to the HBase master).


Best regards,
Andras




On Thu, Jun 27, 2019 at 4:21 AM ShaoFeng Shi <shaofengshi@apache.org> wrote:

I see; Can you try this way: disable all "KYLIN_*" tables in HBase console, and then see whether
the region servers can start.


If they can start, then run the above command to refresh the coprocessor.


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org









Andras Nagy <andras.istvan.nagy@gmail.com> 于2019年6月26日周三 下午10:57写道:

Hi ShaoFeng,

Yes, but it fails as well. Actually it fails because the RegionServers are not running (as
they fail when starting up).
Best regards,
Andras


On Wed, Jun 26, 2019 at 4:42 PM ShaoFeng Shi <shaofengshi@apache.org> wrote:

Hi Andras,


Did you try this? https://kylin.apache.org/docs/howto/howto_update_coprocessor.html


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org









Andras Nagy <andras.istvan.nagy@gmail.com> 于2019年6月26日周三 下午10:05写道:

Greetings,



I'm testing a setup where HBase is running on AWS EMR and HBase data is stored on S3. It's
working fine so far, but when I terminate the EMR cluster and recreate it with the same S3
location for HBase, HBase won't start up properly. Before shutting down, I did execute the
disable_all_tables.sh script to flush HBase state to S3.

Actually the issue is that RegionServers don't start up. Maybe I'm missing something in the
EMR setup and not in Kylin setup, but the exceptions I get in the RegionServer's log point
at Kylin's CubeVisitService coprocessor, which is still trying to connect to the old HBase
master on the old EMR cluster's master node and fails with: "coprocessor.CoprocessorHost:
The coprocessor org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService
threw java.net.NoRouteToHostException: No Route to Host from  ip-172-35-5-11/172.35.5.11 to
ip-172-35-7-125.us-west-2.compute.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException:
No route to host; "


(Here, ip-172-35-7-125 was the old clusters' master node.)

Does anyone have any idea what I'm doing wrong here?
The HBase master node's address seems to be cached somewhere, and when starting up HBase on
the new cluster with the same S3 location for HFiles, this old address is used still.
Is there anything specific I have missed to get this scenario to work properly?

This is the full stacktrace:

2019-06-26 12:33:53,352 ERROR [RS_OPEN_REGION-ip-172-35-5-11:16020-1] coprocessor.CoprocessorHost:
The coprocessor org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService
threw java.net.NoRouteToHostException: No Route to Host from  ip-172-35-5-11/172.35.5.11 to
ip-172-35-7-125.us-west-2.compute.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException:
No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
java.net.NoRouteToHostException: No Route to Host from  ip-172-35-5-11/172.35.5.11 to ip-172-35-7-125.us-west-2.compute.internal:8020
failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For
more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy36.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy37.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1452)
at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1466)
at org.apache.hadoop.hbase.util.CoprocessorClassLoader.getClassLoader(CoprocessorClassLoader.java:264)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:214)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:188)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:376)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.<init>(RegionCoprocessorHost.java:238)
at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:802)
at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:710)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:6716)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7020)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6992)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6948)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6899)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:364)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:131)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 43 more



Many thanks,
Andras
Mime
View raw message