Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AD39E219 for ; Wed, 6 Feb 2013 12:06:02 +0000 (UTC) Received: (qmail 10297 invoked by uid 500); 6 Feb 2013 11:56:58 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 10137 invoked by uid 500); 6 Feb 2013 11:56:52 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 10099 invoked by uid 99); 6 Feb 2013 11:56:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 11:56:50 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [129.240.10.58] (HELO mail-out2.uio.no) (129.240.10.58) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 11:56:43 +0000 Received: from mail-mx6.uio.no ([129.240.10.40]) by mail-out2.uio.no with esmtp (Exim 4.75) (envelope-from ) id 1U33bx-00068F-FT for user@manifoldcf.apache.org; Wed, 06 Feb 2013 12:56:21 +0100 Received: from 1x-193-157-203-136.uio.no ([193.157.203.136]) by mail-mx6.uio.no with esmtpsa (TLSv1:DHE-RSA-CAMELLIA256-SHA:256) user erlendfg (Exim 4.80.1) (envelope-from ) id 1U33bw-0006JE-Mu for user@manifoldcf.apache.org; Wed, 06 Feb 2013 12:56:21 +0100 Message-ID: <511244E4.4020102@usit.uio.no> Date: Wed, 06 Feb 2013 12:56:20 +0100 From: =?ISO-8859-1?Q?Erlend_Gar=E5sen?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: user@manifoldcf.apache.org Subject: Re: max_pred_locks_per_transaction References: <51025DC7.8000702@usit.uio.no> <510BAACB.5050201@usit.uio.no> <510BAEA9.70300@usit.uio.no> <5110FD4A.6040006@usit.uio.no> <51122706.2040709@usit.uio.no> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-UiO-Ratelimit-Test: rcpts/h 6 msgs/h 3 sum rcpts/h 7 sum msgs/h 3 total rcpts 1192 max rcpts/h 12 ratelimit 0 X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, RP_MATCHES_RCVD=-0.001,UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO) X-UiO-Scanned: 51B8EE9629133056BB5A5AAD2BF43E4B5BAA7C9E X-UiO-SPAM-Test: remote_host: 193.157.203.136 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 3 total 13 max/h 3 blacklist 0 greylist 0 ratelimit 0 X-Virus-Checked: Checked by ClamAV on apache.org It's not difficult to reproduce, because it *will* fail with the job I have defined. I'm almost sure which host is causing these problems - it's related to complex articles. I can even create a new job with only the problematic host defined. Yes, that's exactly what I'm going to do, I think. Just a question. I see the following dumping into my logs every second even though no jobs are running at the moment. Is this related to connection tracking? I'm just afraid that it will be difficult to find the relevant log entries with such an active logging. Erlend On 06.02.13 12.43, Karl Wright wrote: > Hi Erlend, > > Various crawler features are not where the problem is going to be > coming from. It will be coming from the basic process of queuing > documents and processing them, and the interaction with PostgreSQL. > > If you are worried that you won't be able to reproduce the issue, my > suggestion is to set up a job similar to what you had before that > hung. As long as you are using the same PostgreSQL server (not > instance), the conditions will be ripe for the same kind of handle > leakage as before. If you had tons of jobs running, and have no idea > which one caused the problem, don't worry - the only thing that might > actually matters is the kind of jobs you are doing: web, rss, etc., > along with possibly the overall system load. > > Karl > > On Wed, Feb 6, 2013 at 4:48 AM, Erlend Gar�sen wrote: >> >> I have some time this week to investigate this further after I finally >> delivered a heavy job about SAML integration. I will look through the source >> code as well and try to log even more if necessary. >> >> Maciej: Can you please send me a detailed description about your similar >> problems? >> >> I previously mentioned that the "exclude from index" functionality in the >> exclusion tab could be the source of the problem, but that does not seem to >> be the case anymore after I got the same problem for another job without any >> settings there. >> >> Just a guessing, but I think the source of the problem is problematic >> documents, for instance invalid XML documents which the crawler tries to >> parse. >> >> Karl: If necessary, you can try to crawl these documents yourself, but first >> I need to isolate the problem to one particular host. >> >> Thanks about the warnings about no backward compatibility for this test, so >> I will only use our test server where data integrity is not that important. >> I'm starting a new crawl right away and will report thereafter. >> >> Erlend >> >> >> On 05.02.13 14.10, Karl Wright wrote: >>> >>> Ok, it is clear from this that most of your threads are waiting to get >>> a connection, and there are no connections to be found. This is >>> exactly the problem that Maciej reported, which I created the >>> CONNECTORS-638 ticket for. There has to be a connection leak >>> somewhere. Obviously it is not a common situation, or the problem >>> would arise almost right away; it probably occurs as a result of some >>> error condition or pathway that is relatively uncommon. >>> >>> The diagnostic code that is now checked into trunk should work as follows: >>> >>> (1) First, checkout and build trunk. Since there are schema changes >>> in trunk vs. older versions of ManifoldCF, you cannot "go backwards" >>> and run an older version on a particular database instance once you've >>> run trunk. Keep that in mind. >>> >>> (2) Add a line to the properties.xml file, as follows: >>> >>> >> value="true"/> >>> >>> (3) Start the system up and let it run. >>> >>> (4) When it fails, you should start to see dumps in the log like this: >>> >>> Logging.db.warn("Out of db connections, list of >>> outstanding ones follows."); >>> for (WrappedConnection c : outstandingConnections) >>> { >>> Logging.db.warn("Found a possibly leaked db >>> connection",c.getInstantiationException()); >>> } >>> >>> ... which will dump where all the offending connections were >>> allocated. Hopefully this will point us at what the problem is. If >>> there seems to be no consistency here, I'll have to explore the >>> possibility that there are bugs in the connection allocation/free >>> code, but we'll see. >>> >>> Karl >>> >> -- >> Erlend Gar�sen >> Center for Information Technology Services >> University of Oslo >> P.O. Box 1086 Blindern, N-0317 OSLO, Norway >> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Gar�sen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050