manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject FW: [VOTE] Release Apache ManifoldCF 1.7.1, RC0
Date Thu, 18 Sep 2014 11:07:07 GMT
Sent from my Windows Phone
------------------------------
From: Wright, Karl
Sent: 9/18/2014 7:06 AM
To: DaddyWri@gmail.com
Subject: FW: [VOTE] Release Apache ManifoldCF 1.7.1, RC0



Sent from my Windows Phone
 ------------------------------
From: Wright, Karl
Sent: 9/18/2014 7:04 AM
To: dev@manifoldcf.apache.org
Subject: RE: [VOTE] Release Apache ManifoldCF 1.7.1, RC0

  The really notable thing is the long chunks of key exceptions.
Essentially this means that all sessions dropped at the same time.

Either this is a network hiccup, or you are having periods of extremely
high load.  I would bet on the network hiccup myself.

Karl

Sent from my Windows Phone
 ------------------------------
From: Karl Wright
Sent: 9/18/2014 6:23 AM
To: dev
Subject: Re: [VOTE] Release Apache ManifoldCF 1.7.1, RC0

 Hi Erlend,

The thread dump shows a couple of things:

(1) The production server is having some trouble talking with postgres;
there are a number of threads open that are transmitting to the database,
which is unusual since most everything is locked up.  This is interesting
in light of the zookeeper log below.

(2) Zookeeper transactions are indeed hung.

Looking at the zookeeper log:

(1) You have multiple cases where sessions have timed out and connection
between client and server appears to have been lost completely.
(2) A tick time of 5000 milliseconds is HOPELESSLY low for some reason on
this system; you are exceeding it regularly.

This is bad.  Disconnection and reconnection will work for transient
difficulties, but when communication drops for a very long time, everyone
gets confused eventually.  While the code is resilient, it's basically in a
situation where locks are being completely lost with no possibility of
recovery.

I cannot even at this time come up with a solution, and no solution may be
possible.  If it were me, I'd try to diagnose the connection between where
zookeeper was running and where MCF is running.

Karl


On Thu, Sep 18, 2014 at 4:42 AM, Erlend Garåsen <e.f.garasen@usit.uio.no>
wrote:

>
> OK, I will try RC1 as well on both servers.
>
> I'm sending new thread dumps and logs for RC0, just to be sure. MCF hangs
> on our prod server once again, but it still runs fine on our test server.
> It seems that it has problem with one of four jobs. The other completed
> successfully. I'm not sure, but it seems that the problem occurs while
> processing robots.txt files. Yesterday I saw similar errors and no
activity
> after these:
> 09-17-2014 15:10:53.167 robots parse https:www.duo.uio.no:443 ERRORS 0 1
> Unknown robots.txt line: '===='
>
> A thread dump (in stdout) and output from Zookeeper (stderr) (there are
> entries of CancelledKeyException) can be found here. Be aware of big log
> files, so always look at the end of them.
> http://folk.uio.no/erlendfg/manifoldcf/
>
> Erlend
>
>
> On 17.09.14 15:45, Karl Wright wrote:
>
>> Hi Erlend,
>>
>> The same condition that affects locks will also affect registration of
>> services, it appears.  So I will need to make more changes to address
that
>> problem as well.
>>
>> Karl
>>
>> On Wed, Sep 17, 2014 at 9:12 AM, Erlend Garåsen <e.f.garasen@usit.uio.no>
>> wrote:
>>
>>
>>> Both servers are running now. Not sure about what caused the problems on
>>> prod. The only thing I did different was to do a lock clean on prod
prior
>>> to startup.
>>>
>>> I'll keep both servers up and running in 24 hours and vote thereafter.
>>>
>>> Erlend
>>>
>>>
>>> On 17.09.14 15:05, Erlend Garåsen wrote:
>>>
>>>  On 17.09.14 14:55, Karl Wright wrote:
>>>>
>>>>  Hi Erlend,
>>>>>
>>>>> Yes, this is shutdown related.  The patch file did not include the fix
>>>>> for
>>>>> this particular problem.  The release candidate, however, does.
>>>>>
>>>>>
>>>> This is not from the patch, but from 1.7.1. I just meant to say that I
>>>> did not had any problems using the patch.
>>>>
>>>> The thread dump is included in my stdout log file since the output of
>>>> kill -3 where placed there. Please note that it is included in THE END
>>>> of that file. I'm in a hurry, so I didn't have time to delete all the
>>>> other irrelevant entries. Sorry about that:
>>>> http://folk.uio.no/erlendfg/manifoldcf/mcf_agent.stdout.log
>>>>
>>>> I'll try to restart everything and get MCF up and running. Runs fine on
>>>> our test server, but not on prod. I'll get back to this.
>>>>
>>>> E
>>>>
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message