incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ying Tang <ivytang0...@gmail.com>
Subject Re: chukwa agent doesn't collect the log suddenly , and after several days ,the agent crashes.
Date Wed, 27 Jul 2011 03:05:06 GMT
The log didn't rotate very  rapidly.

Now i can't rebuild the scenario . But when the chukwa agent log looks ok,

 2011-07-27 10:57:38,967 INFO Timer-0 ChukwaAgent - writing checkpoint
1307083
2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - collected 1
chunks for post_745
2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP
post_745 to http://chukwacollector1.xingcloud.com:9095/ length = 1837
2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP
Got success back from http://chukwacollector1.xingcloud.com:9095/chukwa;
response length 43
2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - post_745
sent 0 chunks, got back 1 acks

The list in telnet agent 9093 is:
adaptor_2963225a90653a309cf779d4a1d815a3)
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
Gamelog 0 /var/log/gamelog 10487067
After several minites ,  the list is still
 adaptor_2963225a90653a309cf779d4a1d815a3)
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
Gamelog 0 /var/log/gamelog 10487067

Is the 10487067 the offset number ?The number didn't changed , and the log
file's size is from 0 to 10M .And now the log file's size is 1150872.

On Wed, Jul 27, 2011 at 12:26 AM, Eric Yang <eric818@gmail.com> wrote:

> CharFileTailingAdaptorUTF should handle log rotation gracefully.  Is the
> log rotating rapidly?
>
> Run those command on chukwa agent:
> telnet localhost 9093
> list
>
> This should show a list of tailing files, and check the offset number of
> the tailing log file.  The most right number should be smaller than the size
> of your log file.  If it is bigger and not changing, it is most likely there
> is a bug that we haven't seen before.  It might be useful to turn on debug
> on chukwa agent and see if this can be reproduced to nail down the root
> cause.  Thanks
>
> regards,
> Eric
>
>  On Jul 26, 2011, at 6:13 AM, Ying Tang wrote:
>
>  Is there the possibility that
> when the log file reaches the log4g config file size ,the log4j will rename
> this log file and create a new file with this name as the log file . At the
> time ,the chukwa adaptor doesn't tail the log properly , and this cause the
> chuwa agent can't collector the log any more.
>
> On Tue, Jul 26, 2011 at 2:07 PM, Ying Tang <ivytang0812@gmail.com> wrote:
>
>> The log file is log4j log file ,and the size is 10M ,the maxbackupindex is
>> 1.
>>
>>
>>
>> On Tue, Jul 26, 2011 at 1:42 PM, Eric Yang <eric818@gmail.com> wrote:
>>
>>> Can you run "ls -l" to show the size and dateof the log files that you
>>> are streaming?
>>>
>>> regards,
>>> Eric
>>>
>>> On Mon, Jul 25, 2011 at 7:36 PM, Ying Tang <ivytang0812@gmail.com>
>>> wrote:
>>> > The chukwa version is 0.4.0 and the adaptor is
>>> >
>>> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
>>> >
>>> > On Mon, Jul 25, 2011 at 11:50 PM, Eric Yang <eric818@gmail.com> wrote:
>>> >>
>>> >> Hi Ivy,
>>> >>
>>> >> When data is send from agent to collector, collector send
>>> acknowledgment
>>> >> of receiving of the chunks.  At 00:03:28, there are 5 chunks
>>> acknowledged.
>>> >>  This means communication between collector and agent are working at
>>> that
>>> >> point in time.  However, there is no activity after 00:04:28.  This
>>> looks
>>> >> like adaptor did not handle the log rotation properly at close to
>>> midnight.
>>> >>  Which version of Chukwa are you using and which adaptor are you
>>> using?
>>> >>
>>> >> regards,
>>> >> Eric
>>> >>
>>> >> On Jul 25, 2011, at 12:40 AM, Ying Tang wrote:
>>> >>
>>> >> > Hi all,
>>> >> >
>>> >> > In my cluster , i have two chukwa agent and one collector .
>>> >> > At a time ,  both chukwa agents's log :
>>> >> > 2011-07-18 00:03:28,688 INFO Timer-1 HttpConnector - # http chunks
>>> >> > ACK'ed since last report: 5
>>> >> > 2011-07-18 00:04:28,697 INFO Timer-1 HttpConnector - # http chunks
>>> >> > ACK'ed since last report: 0
>>> >> > 2011-07-18 00:05:28,706 INFO Timer-1 HttpConnector - # http chunks
>>> >> > ACK'ed since last report: 0
>>> >> > 2011-07-18 00:06:28,714 INFO Timer-1 HttpConnector - # http chunks
>>> >> > ACK'ed since last report: 0
>>> >> > 2011-07-18 00:07:29,340 INFO Timer-1 HttpConnector - # http chunks
>>> >> > ACK'ed since last report: 0
>>> >> >
>>> >> > And the collector
>>> >> > 2011-07-17 11:02:32,155 INFO Timer-3 SeqFileWriter -
>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>> >> > 2011-07-17 11:02:43,074 INFO Timer-1 root -
>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>>> >> > 2011-07-17 11:03:02,162 INFO Timer-3 SeqFileWriter -
>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>> >> > 2011-07-17 11:03:32,168 INFO Timer-3 SeqFileWriter -
>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>> >> > 2011-07-17 11:03:43,085 INFO Timer-1 root -
>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>>> >> > 2011-07-17 11:04:02,174 INFO Timer-3 SeqFileWriter -
>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>> >> > 2011-07-17 11:04:32,180 INFO Timer-3 SeqFileWriter -
>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>> >> > 2011-07-17 11:04:43,096 INFO Timer-1 root -
>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>>> >> > 2011-07-17 11:05:02,185 INFO Timer-3 SeqFileWriter -
>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>> >> >
>>> >> > (the collector and agent has  different  timezone)
>>> >> > And the collector didn't collect any log.
>>> >> >
>>> >> >
>>> >> > What dons the "http chunks ACK'ed since last report: 0" means?
>>> >> > And from this log "http chunks ACK'ed since last report: 0" appears
>>> to
>>> >> >  agent crash, the chukwa port still on , but after several days,
>>> both agents
>>> >> > crashed without exceptions.
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Best regards,
>>> >> >
>>> >> > Ivy Tang
>>> >> >
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> > Ivy Tang
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>
>


-- 
Best regards,

Ivy Tang

Mime
View raw message