incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: Chunks for events getting jumbles up in Collector
Date Mon, 04 Jul 2011 06:08:15 GMT
Yes, they are the same issue.  The corresponding chunk should have
timestamp in front of INFO, but it is sending as hostname instead.
There were two metrics got merged into one log entry during flush.
Hence, the parser throws an exception for malformed entry, which was
captured by the framework, and attempt to write malformed entry
exception to HadoopMetricsInError.  HadoopMetricsInError is not a
defined column family, therefore, you are seening this exception in
the collector log.

regards,
Eric

On Sun, Jul 3, 2011 at 10:28 PM, Preetam Patil <pvpatil.iitb@gmail.com> wrote:
> Thanks for the prompt response!
> Is this also the reason I'm seeing lots of errors of this type?
> 2011-07-04 10:52:35,271 WARN btpool0-97 HBaseWriter -
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 1 action: NoSuchColumnFamilyException: 1 time, servers with issues:
> chukwanode:49137,
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1223)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1237)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
> at
> org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:203)
> at
> org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
> at
> org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:41)
> at
> org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:154)
> From the debugger, the 'put' request is seen as:
> row=clus1-datanode8-HadoopMetrics, families={(family=HadoopMetricsInError,
> keyvalues=(clus1-datanode8-HadoopMetrics/HadoopMetricsInError:capp/1309756958715/Put/vlen=15,
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:cchunkData/1309756958715/Put/vlen=585,
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:cchunkException/1309756958715/Put/vlen=1977,
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:csource/1309756958715/Put/vlen=15,
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:ctags/1309756958715/Put/vlen=16)}]
> I can see from the HBase schema that HadoopMetricsInError column family is
> not there in any of the tables. The corresponding chunk is:
> clus1-datanode8:clus1-datanode8: INFO chukwa.metrics.rpc:
> {"rpcAuthorizationSuccesses":9,"port":"58148","rpcAuthorizationFailures":0,"recordName":"metrics","ReceivedBytes":90903,"chukwa_timestamp":1309517476828,"RpcQueueTime_num_ops":174,"RpcProcessingTime_num_ops":174,"rpcAuthenticationFailures":0,"period":60,"RpcQueueTime_avg_time":0,"SentBytes":8177,"NumOpenConnections":0,"callQueueLen":0,"rpcAuthenticationSuccesses":0,"hostName":"clus1-datanode8","RpcProcessingTime_avg_time":0,"contextName":"rpc"}
> Thanks,
> -preetam
> On Mon, Jul 4, 2011 at 10:34 AM, Eric Yang <eric818@gmail.com> wrote:
>>
>> Hi,
>>
>> This is actually not a chukwa error.  Hadoop metrics context injects
>> errors like this for jvm metrics, when system is under stress.  This
>> issue should be addressed in Hadoop Metrics V2 in Hadoop v0.23.
>>
>> regards,
>> Eric
>>
>> On Sun, Jul 3, 2011 at 9:40 PM, Preetam Patil <pvpatil.iitb@gmail.com>
>> wrote:
>> > Hi,
>> > I'm noticing a strange behavior in the chukwa collector that is causing
>> > the
>> > parsers to fail. When a large number of events are coming to the
>> > collector,
>> > the event entries get kind of jumbled up (date missing, two entries
>> > getting
>> > interleaved, two events getting passed as a single chunk, etc). As a
>> > result,
>> > the HadoopMetricsProcessor throws a NullPointerException because the
>> > parser
>> > returns a null object for a chunk. I have noticed that in all cases of
>> > this
>> > error, chukwa.metrics.jvm is involved. But it does not occur on every
>> > instance of chukwa.metrics.jvm event being received.
>> > Here's one example of the error (in collector.log):
>> > 2011-07-04 09:42:18,617 WARN btpool0-18 HadoopMetricsProcessor - Wrong
>> > format in HadoopMetricsProcessor [2011-07-01 16:29:14,860 2011-07-01
>> > 16:29:14,862 INFOINFO  chukwa.metrics.jvm: chukwa.metrics.dfs:
>> >
>> > {"memHeapCommittedM":24.6875,"recordName":"metrics","chukwa_timestamp":1309517954860,"logFatal":0,"processName":"DataNode","threadsWaiting":8,"threadsBlocked":0,"gcCount":482,"logError":8,"logWarn":9,"period":60,"memNonHeapCommittedM":23.1875,"memNonHeapUsedM":18.441505,"gcTimeMillis":9625,"sessionId":"","logInfo":332,"threadsNew":0,"memHeapUsedM":10.359726,"threadsTerminated":0,"hostName":"clus1-datanode6","threadsTimedWaiting":9,"maxMemoryM":888.9375,"threadsRunnable":8,"contextName":"jvm"}{"recordName":"datanode","chukwa_timestamp":1309517954862,"readBlockOp_num_ops":10,"writeBlockOp_avg_time":0,"block_verification_failures":0,"blockChecksumOp_num_ops":0,"blocks_read":10,"blocks_written":13,"copyBlockOp_avg_time":0,"heartBeats_num_ops":197707,"writes_from_remote_client":12,"replaceBlockOp_num_ops":0,"blockReports_avg_time":0,"blocks_replicated":0,"writeBlockOp_num_ops":13,"heartBeats_avg_time":7,"hostName":"clus1-datanode6","bytes_read":60797,"copyBlockOp_num_ops":0,"contextName":"dfs","blockReports_num_ops":167,"bytes_written":389862107,"replaceBlockOp_avg_time":0,"reads_from_remote_client":8,"period":60,"readBlockOp_avg_time":0,"reads_from_local_client":2,"blocks_verified":1,"writes_from_local_client":1,"sessionId":"","blockChecksumOp_avg_time":0,"blocks_removed":14}
>> > ]
>> > java.lang.NullPointerException
>> > at
>> >
>> > org.apache.hadoop.chukwa.extraction.demux.processor.mapper.HadoopMetricsProcessor.parse(HadoopMetricsProcessor.java:103)
>> > at
>> >
>> > org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:81)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:202)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:41)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:154)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:203)
>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>> > at
>> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
>> > at
>> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
>> > at
>> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>> > at
>> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
>> > at
>> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> > at org.mortbay.jetty.Server.handle(Server.java:324)
>> > at
>> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
>> > at
>> >
>> > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
>> > at
>> >
>> > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
>> > at
>> >
>> > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)
>> > 2011-07-04 09:42:18,618 WARN btpool0-18 HBaseWriter -
>> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> > Failed
>> > 1 action: NoSuchColumnFamilyException: 1 time, servers with issues:
>> > chukwanode:49137,
>> > at
>> >
>> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1223)
>> > at
>> >
>> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1237)
>> > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
>> > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
>> > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:203)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:41)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:154)
>> > at
>> >
>> > org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:203)
>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>> > at
>> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
>> > at
>> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
>> > at
>> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>> > at
>> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
>> > at
>> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> > at org.mortbay.jetty.Server.handle(Server.java:324)
>> > at
>> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
>> > at
>> >
>> > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
>> > at
>> >
>> > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
>> > at
>> >
>> > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)
>> > Here's the corresponding entry in the sequenceFile written by the
>> > SequenceFileWriter in HDFS:
>> >
>> > ^@^@^F^R^@^@^@I<8a>^A0ó2^FÀ^MHadoopMetrics0
>> >
>> > cluster="Clus1"/clus1-datanode6/clus1-datanode6<8e>^Ep^@^@^@^A^@^@^@^@^@^@^Ep^@^Oclus1-datanode6^@^P
>> >
>> > cluster="Clus1"^@^Oclus1-datanode6^@^MHadoopMetrics^@^@^@^@^@^A^@^@^Eo2011-07-01
>> > 16:29:14,860 2011-07-01 16:29:14,862 INFOINFO  chukwa.metrics.jvm:
>> > chukwa.metrics.dfs:
>> >
>> > {"memHeapCommittedM":24.6875,"recordName":"metrics","chukwa_timestamp":1309517954860,"logFatal":0,"processName":"DataNode","threadsWaiting":8,"threadsBlocked":0,"gcCount":482,"logError":8,"logWarn":9,"period":60,"memNonHeapCommittedM":23.1875,"memNonHeapUsedM":18.441505,"gcTimeMillis":9625,"sessionId":"","logInfo":332,"threadsNew":0,"memHeapUsedM":10.359726,"threadsTerminated":0,"hostName":"clus1-datanode6","threadsTimedWaiting":9,"maxMemoryM":888.9375,"threadsRunnable":8,"contextName":"jvm"}{"recordName":"datanode","chukwa_timestamp":1309517954862,"readBlockOp_num_ops":10,"writeBlockOp_avg_time":0,"block_verification_failures":0,"blockChecksumOp_num_ops":0,"blocks_read":10,"blocks_written":13,"copyBlockOp_avg_time":0,"heartBeats_num_ops":197707,"writes_from_remote_client":12,"replaceBlockOp_num_ops":0,"blockReports_avg_time":0,"blocks_replicated":0,"writeBlockOp_num_ops":13,"heartBeats_avg_time":7,"hostName":"clus1-datanode6","bytes_read":60797,"copyBlockOp_num_ops":0,"contextName":"dfs","blockReports_num_ops":167,"bytes_written":389862107,"replaceBlockOp_avg_time":0,"reads_from_remote_client":8,"period":60,"readBlockOp_avg_time":0,"reads_from_local_client":2,"blocks_verified":1,"writes_from_local_client":1,"sessionId":"","blockChecksumOp_avg_time":0,"blocks_removed":14}
>> > Anyone else observed this kind of error? To me, it seems to be something
>> > to
>> > do with 'events' LIST creation. Any fix or a pointer to fixing it is
>> > welcome
>> > :-)
>> > Thanks,
>> > -preetam
>> >
>
>

Mime
View raw message