ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eirik Thorsnes <eirik.thors...@uni.no>
Subject Ambari metrics collector dies in 2.1.3-snapshot
Date Thu, 03 Dec 2015 18:16:01 GMT
Hi,

I'm testing Ambari 2.1.3-snapshot (from Dec 1st, a830cc0) on HDP2.3.0
stack. In this setup Ambari-metrics-collector dies after some minutes
with the below log-paste (note the "FATAL" error, this comes after many
of the exceptions seen on top).

Possibly related to the pasted error below:

On startup it fails to load the native libraries, from the log:

2015-12-03 18:40:44,296  WARN [main] NativeCodeLoader:62 - Unable to
load native-hadoop library for your platform... using builtin-java
classes where applicable

even though they exist in the java.library.path given some lines below
in the log:

2015-12-03 18:40:44,396  INFO [main] ZooKeeper:100 - Client
environment:java.library.path=/usr/lib/ams-hbase/lib/hadoop-native -Xmx3072m

I also tried to replace the path above with a symlink to the
hadoop-client/lib/native dir (which has different content) - but this
did not help.

=========== paste ===============

Thu Dec 03 18:26:25 CET 2015,
RpcRetryingCaller{globalStartTime=1449163034289, pause=100, retries=35},
java.io.IOException: java.io.IOException:
java.lang.NoClassDefFoundError: org/iq8
0/snappy/CorruptionException
        at
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:78)
        at
org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService.callMethod(ServerCachingProtos.java:3200)
        at
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855)
        at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError:
org/iq80/snappy/CorruptionException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:72)
        ... 10 more
Caused by: java.lang.ClassNotFoundException:
org.iq80.snappy.CorruptionException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more

        at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)
        at
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
        at
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(CoprocessorRpcChannel.java:56)
        at
org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService$Stub.addServerCache(ServerCachingProtos.java:3270)
        at
org.apache.phoenix.cache.ServerCacheClient$1$1.call(ServerCacheClient.java:204)
        at
org.apache.phoenix.cache.ServerCacheClient$1$1.call(ServerCacheClient.java:189)
        at org.apache.hadoop.hbase.client.HTable$16.call(HTable.java:1741)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException:
java.lang.NoClassDefFoundError: org/iq80/snappy/CorruptionException
        at
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:78)
        at
org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService.callMethod(ServerCachingProtos.java:3200)
        at
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855)
        at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError:
org/iq80/snappy/CorruptionException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:72)
        ... 10 more
Caused by: java.lang.ClassNotFoundException:
org.iq80.snappy.CorruptionException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more

        at
sun.reflect.GeneratedConstructorAccessor43.newInstance(Unknown Source)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:322)
        at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1619)
        at
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:92)
        at
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:89)
        at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
        ... 10 more
Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
java.io.IOException: java.lang.NoClassDefFoundError:
org/iq80/snappy/CorruptionException
        at
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:78)
        at
org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService.callMethod(ServerCachingProtos.java:3200)
        at
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855)
        at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError:
org/iq80/snappy/CorruptionException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:72)
        ... 10 more
Caused by: java.lang.ClassNotFoundException:
org.iq80.snappy.CorruptionException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more

        at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1206)
        at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
        at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
        at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:32675)
        at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1615)
        ... 13 more
2015-12-03 18:26:25,220  INFO
[hconnection-0x33bc72d1-shared--pool2-t265] RpcRetryingCaller:132 - Call
exception, tries=16, retries=35, started=188985 ms ago, cancelled=false,
msg=row
'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00com
pute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95.,
hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729
2015-12-03 18:26:25,539  INFO
[hconnection-0x33bc72d1-shared--pool2-t155] RpcRetryingCaller:132 - Call
exception, tries=27, retries=35, started=409895 ms ago, cancelled=false,
msg=row
'' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1432015934895.0f0a9816ffb93fe65176292b6ad378d1.,
hostname=compute-10-1.local,61320,1449162924698, seqNum=24131209
2015-12-03 18:26:25,597  INFO
[hconnection-0x33bc72d1-shared--pool2-t153] RpcRetryingCaller:132 - Call
exception, tries=27, retries=35, started=409953 ms ago, cancelled=false,
msg=row
'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00compute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95.,
hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729
2015-12-03 18:26:25,680  INFO
[hconnection-0x33bc72d1-shared--pool2-t215] RpcRetryingCaller:132 - Call
exception, tries=22, retries=35, started=309625 ms ago, cancelled=false,
msg=row
'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00compute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95.,
hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729
2015-12-03 18:26:26,085  INFO
[hconnection-0x33bc72d1-shared--pool2-t228] RpcRetryingCaller:132 - Call
exception, tries=29, retries=35, started=450123 ms ago, cancelled=false,
msg=row
'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00compute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95.,
hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729
2015-12-03 18:26:26,276 FATAL [pool-1-thread-1]
TimelineMetricStoreWatcher:79 - Error getting metrics from
TimelineMetricStore. Shutting down by TimelineMetricStoreWatcher.
2015-12-03 18:26:26,279  INFO [pool-1-thread-1] ExitUtil:124 - Exiting
with status -1
2015-12-03 18:26:26,281  INFO [Thread-3]
ConnectionManager$HConnectionImplementation:2068 - Closing master
protocol: MasterService
2015-12-03 18:26:26,426  INFO
[hconnection-0x33bc72d1-shared--pool2-t227] RpcRetryingCaller:132 - Call
exception, tries=29, retries=35, started=450464 ms ago, cancelled=false,
msg=row '' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1432015934895.0f0a9816ffb93fe65176292b6ad378d1.,
hostname=compute-10-1.local,61320,1449162924698, seqNum=24131209
2015-12-03 18:26:26,442  INFO [Thread-1] log:67 - Stopped
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:6188
2015-12-03 18:26:26,451  WARN [1705435578@qtp-1802896480-9]
GenericExceptionHandler:98 - INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: java.sql.SQLException: Sub plan [0]
execution interrupted.
        at
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices.getTimelineMetrics(TimelineWebServices.java:387)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
...



-- 
Eirik Thorsnes


Mime
View raw message