chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ying Tang <ivytang0...@gmail.com>
Subject Re: chukwa agent doesn't collect the log suddenly , and after several days ,the agent crashes.
Date Wed, 27 Jul 2011 05:54:44 GMT
And after i restart the chukwa agent , the telnet adaptor last number is
still 10487067.

On Wed, Jul 27, 2011 at 1:52 PM, Ying Tang <ivytang0812@gmail.com> wrote:

> Does this mean the adaptor still transfer the previous log ? The current
> log is missing?
>
>
> On Wed, Jul 27, 2011 at 1:08 PM, Eric Yang <eric818@gmail.com> wrote:
>
>> This looks like a bug, the last number should be in sync with the
>> current file's size, but the UTF adaptor is still tailing the previous
>> file (which rotated at 10487067)
>> It means there is a bug in handling the file rotation, but the adaptor
>> did not pick up the change.
>>
>> Please open a jira.  Thanks
>>
>> regards,
>> Eric
>>
>> On Tue, Jul 26, 2011 at 8:05 PM, Ying Tang <ivytang0812@gmail.com> wrote:
>> > The log didn't rotate very  rapidly.
>> >
>> > Now i can't rebuild the scenario . But when the chukwa agent log looks
>> ok,
>> >
>> >  2011-07-27 10:57:38,967 INFO Timer-0 ChukwaAgent - writing checkpoint
>> > 1307083
>> > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender -
>> collected 1
>> > chunks for post_745
>> > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP
>> > post_745 to http://chukwacollector1.xingcloud.com:9095/ length = 1837
>> > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP
>> > Got success back from http://chukwacollector1.xingcloud.com:9095/chukwa
>> ;
>> > response length 43
>> > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender -
>> post_745
>> > sent 0 chunks, got back 1 acks
>> >
>> > The list in telnet agent 9093 is:
>> > adaptor_2963225a90653a309cf779d4a1d815a3)
>> >
>> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
>> > Gamelog 0 /var/log/gamelog 10487067
>> > After several minites ,  the list is still
>> > adaptor_2963225a90653a309cf779d4a1d815a3)
>> >
>> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
>> > Gamelog 0 /var/log/gamelog 10487067
>> >
>> > Is the 10487067 the offset number ?The number didn't changed , and the
>> log
>> > file's size is from 0 to 10M .And now the log file's size is 1150872.
>> >
>> > On Wed, Jul 27, 2011 at 12:26 AM, Eric Yang <eric818@gmail.com> wrote:
>> >>
>> >> CharFileTailingAdaptorUTF should handle log rotation gracefully.  Is
>> the
>> >> log rotating rapidly?
>> >> Run those command on chukwa agent:
>> >> telnet localhost 9093
>> >> list
>> >> This should show a list of tailing files, and check the offset number
>> of
>> >> the tailing log file.  The most right number should be smaller than the
>> size
>> >> of your log file.  If it is bigger and not changing, it is most likely
>> there
>> >> is a bug that we haven't seen before.  It might be useful to turn on
>> debug
>> >> on chukwa agent and see if this can be reproduced to nail down the root
>> >> cause.  Thanks
>> >> regards,
>> >> Eric
>> >> On Jul 26, 2011, at 6:13 AM, Ying Tang wrote:
>> >>
>> >> Is there the possibility that
>> >> when the log file reaches the log4g config file size ,the log4j will
>> >> rename this log file and create a new file with this name as the log
>> file .
>> >> At the time ,the chukwa adaptor doesn't tail the log properly , and
>> this
>> >> cause the chuwa agent can't collector the log any more.
>> >>
>> >> On Tue, Jul 26, 2011 at 2:07 PM, Ying Tang <ivytang0812@gmail.com>
>> wrote:
>> >>>
>> >>> The log file is log4j log file ,and the size is 10M ,the
>> maxbackupindex
>> >>> is 1.
>> >>>
>> >>>
>> >>> On Tue, Jul 26, 2011 at 1:42 PM, Eric Yang <eric818@gmail.com>
wrote:
>> >>>>
>> >>>> Can you run "ls -l" to show the size and dateof the log files that
>> you
>> >>>> are streaming?
>> >>>>
>> >>>> regards,
>> >>>> Eric
>> >>>>
>> >>>> On Mon, Jul 25, 2011 at 7:36 PM, Ying Tang <ivytang0812@gmail.com>
>> >>>> wrote:
>> >>>> > The chukwa version is 0.4.0 and the adaptor is
>> >>>> >
>> >>>> >
>> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
>> >>>> >
>> >>>> > On Mon, Jul 25, 2011 at 11:50 PM, Eric Yang <eric818@gmail.com>
>> wrote:
>> >>>> >>
>> >>>> >> Hi Ivy,
>> >>>> >>
>> >>>> >> When data is send from agent to collector, collector send
>> >>>> >> acknowledgment
>> >>>> >> of receiving of the chunks.  At 00:03:28, there are 5 chunks
>> >>>> >> acknowledged.
>> >>>> >>  This means communication between collector and agent are
working
>> at
>> >>>> >> that
>> >>>> >> point in time.  However, there is no activity after 00:04:28.
>>  This
>> >>>> >> looks
>> >>>> >> like adaptor did not handle the log rotation properly at
close to
>> >>>> >> midnight.
>> >>>> >>  Which version of Chukwa are you using and which adaptor
are you
>> >>>> >> using?
>> >>>> >>
>> >>>> >> regards,
>> >>>> >> Eric
>> >>>> >>
>> >>>> >> On Jul 25, 2011, at 12:40 AM, Ying Tang wrote:
>> >>>> >>
>> >>>> >> > Hi all,
>> >>>> >> >
>> >>>> >> > In my cluster , i have two chukwa agent and one collector
.
>> >>>> >> > At a time ,  both chukwa agents's log :
>> >>>> >> > 2011-07-18 00:03:28,688 INFO Timer-1 HttpConnector
- # http
>> chunks
>> >>>> >> > ACK'ed since last report: 5
>> >>>> >> > 2011-07-18 00:04:28,697 INFO Timer-1 HttpConnector
- # http
>> chunks
>> >>>> >> > ACK'ed since last report: 0
>> >>>> >> > 2011-07-18 00:05:28,706 INFO Timer-1 HttpConnector
- # http
>> chunks
>> >>>> >> > ACK'ed since last report: 0
>> >>>> >> > 2011-07-18 00:06:28,714 INFO Timer-1 HttpConnector
- # http
>> chunks
>> >>>> >> > ACK'ed since last report: 0
>> >>>> >> > 2011-07-18 00:07:29,340 INFO Timer-1 HttpConnector
- # http
>> chunks
>> >>>> >> > ACK'ed since last report: 0
>> >>>> >> >
>> >>>> >> > And the collector
>> >>>> >> > 2011-07-17 11:02:32,155 INFO Timer-3 SeqFileWriter
-
>> >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>> >>>> >> > 2011-07-17 11:02:43,074 INFO Timer-1 root -
>> >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>> >>>> >> > 2011-07-17 11:03:02,162 INFO Timer-3 SeqFileWriter
-
>> >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>> >>>> >> > 2011-07-17 11:03:32,168 INFO Timer-3 SeqFileWriter
-
>> >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>> >>>> >> > 2011-07-17 11:03:43,085 INFO Timer-1 root -
>> >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>> >>>> >> > 2011-07-17 11:04:02,174 INFO Timer-3 SeqFileWriter
-
>> >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>> >>>> >> > 2011-07-17 11:04:32,180 INFO Timer-3 SeqFileWriter
-
>> >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>> >>>> >> > 2011-07-17 11:04:43,096 INFO Timer-1 root -
>> >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>> >>>> >> > 2011-07-17 11:05:02,185 INFO Timer-3 SeqFileWriter
-
>> >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>> >>>> >> >
>> >>>> >> > (the collector and agent has  different  timezone)
>> >>>> >> > And the collector didn't collect any log.
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > What dons the "http chunks ACK'ed since last report:
0" means?
>> >>>> >> > And from this log "http chunks ACK'ed since last report:
0"
>> appears
>> >>>> >> > to
>> >>>> >> >  agent crash, the chukwa port still on , but after
several days,
>> >>>> >> > both agents
>> >>>> >> > crashed without exceptions.
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > --
>> >>>> >> > Best regards,
>> >>>> >> >
>> >>>> >> > Ivy Tang
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best regards,
>> >>>> > Ivy Tang
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>> Ivy Tang
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >> Ivy Tang
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Ivy Tang
>> >
>> >
>> >
>>
>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best regards,

Ivy Tang

Mime
View raw message