Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@manifoldcf.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <511244E4.4020102@usit.uio.no>
Date: Wed, 06 Feb 2013 12:56:20 +0100
From: =?ISO-8859-1?Q?Erlend_Gar=E5sen?= <e.f.garasen@usit.uio.no>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: user@manifoldcf.apache.org
Subject: Re: max_pred_locks_per_transaction
References: <51025DC7.8000702@usit.uio.no>
 <CALUFAGCvz-jXZMqZTv-5s0SbmtAOnTcGsfb5vQKsoJiSBnNnJg@mail.gmail.com>
 <510BAACB.5050201@usit.uio.no> <510BAEA9.70300@usit.uio.no>
 <CALUFAGAU4Wc+QqQ2xy96jJSJx_ZfcC0MXqiKqysr-TsJFy1i4A@mail.gmail.com>
 <CALUFAGAfPrQQW-gJE5fwUncuWE4T9rfUGz8miS2oVxTfU8r4yA@mail.gmail.com>
 <5110FD4A.6040006@usit.uio.no>
 <CALUFAGDXhda2Aob8if=Q3mEAaqcXK09SYq9TD_5Dprsd+fGKQg@mail.gmail.com>
 <51122706.2040709@usit.uio.no>
 <CALUFAGBjNRK=uBMs_6y=+4MUw8Dc+=5zAeQ_UVRB4Q9C9fi_TA@mail.gmail.com>
In-Reply-To: 
 <CALUFAGBjNRK=uBMs_6y=+4MUw8Dc+=5zAeQ_UVRB4Q9C9fi_TA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit


It's not difficult to reproduce, because it *will* fail with the job I 
have defined. I'm almost sure which host is causing these problems - 
it's related to complex articles. I can even create a new job with only 
the problematic host defined. Yes, that's exactly what I'm going to do, 
I think.

Just a question. I see the following dumping into my logs every second 
even though no jobs are running at the moment. Is this related to 
connection tracking? I'm just afraid that it will be difficult to find 
the relevant log entries with such an active logging.

Erlend


On 06.02.13 12.43, Karl Wright wrote:
> Hi Erlend,
>
> Various crawler features are not where the problem is going to be
> coming from.  It will be coming from the basic process of queuing
> documents and processing them, and the interaction with PostgreSQL.
>
> If you are worried that you won't be able to reproduce the issue, my
> suggestion is to set up a job similar to what you had before that
> hung.  As long as you are using the same PostgreSQL server (not
> instance), the conditions will be ripe for the same kind of handle
> leakage as before.  If you had tons of jobs running, and have no idea
> which one caused the problem, don't worry - the only thing that might
> actually matters is the kind of jobs you are doing: web, rss, etc.,
> along with possibly the overall system load.
>
> Karl
>
> On Wed, Feb 6, 2013 at 4:48 AM, Erlend Gar�sen <e.f.garasen@usit.uio.no> wrote:
>>
>> I have some time this week to investigate this further after I finally
>> delivered a heavy job about SAML integration. I will look through the source
>> code as well and try to log even more if necessary.
>>
>> Maciej: Can you please send me a detailed description about your similar
>> problems?
>>
>> I previously mentioned that the "exclude from index" functionality in the
>> exclusion tab could be the source of the problem, but that does not seem to
>> be the case anymore after I got the same problem for another job without any
>> settings there.
>>
>> Just a guessing, but I think the source of the problem is problematic
>> documents, for instance invalid XML documents which the crawler tries to
>> parse.
>>
>> Karl: If necessary, you can try to crawl these documents yourself, but first
>> I need to isolate the problem to one particular host.
>>
>> Thanks about the warnings about no backward compatibility for this test, so
>> I will only use our test server where data integrity is not that important.
>> I'm starting a new crawl right away and will report thereafter.
>>
>> Erlend
>>
>>
>> On 05.02.13 14.10, Karl Wright wrote:
>>>
>>> Ok, it is clear from this that most of your threads are waiting to get
>>> a connection, and there are no connections to be found.  This is
>>> exactly the problem that Maciej reported, which I created the
>>> CONNECTORS-638 ticket for.  There has to be a connection leak
>>> somewhere.  Obviously it is not a common situation, or the problem
>>> would arise almost right away; it probably occurs as a result of some
>>> error condition or pathway that is relatively uncommon.
>>>
>>> The diagnostic code that is now checked into trunk should work as follows:
>>>
>>> (1) First, checkout and build trunk.  Since there are schema changes
>>> in trunk vs. older versions of ManifoldCF, you cannot "go backwards"
>>> and run an older version on a particular database instance once you've
>>> run trunk.  Keep that in mind.
>>>
>>> (2) Add a line to the properties.xml file, as follows:
>>>
>>> <property name="org.apache.manifoldcf.database.connectiontracking"
>>> value="true"/>
>>>
>>> (3) Start the system up and let it run.
>>>
>>> (4) When it fails, you should start to see dumps in the log like this:
>>>
>>>               Logging.db.warn("Out of db connections, list of
>>> outstanding ones follows.");
>>>               for (WrappedConnection c : outstandingConnections)
>>>               {
>>>                 Logging.db.warn("Found a possibly leaked db
>>> connection",c.getInstantiationException());
>>>               }
>>>
>>> ... which will dump where all the offending connections were
>>> allocated.  Hopefully this will point us at what the problem is.  If
>>> there seems to be no consistency here, I'll have to explore the
>>> possibility that there are bugs in the connection allocation/free
>>> code, but we'll see.
>>>
>>> Karl
>>>
>> --
>> Erlend Gar�sen
>> Center for Information Technology Services
>> University of Oslo
>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


-- 
Erlend Gar�sen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050