Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A4C610A55 for ; Fri, 19 Jul 2013 09:42:44 +0000 (UTC) Received: (qmail 48504 invoked by uid 500); 19 Jul 2013 09:42:39 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 48467 invoked by uid 500); 19 Jul 2013 09:42:37 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 48448 invoked by uid 99); 19 Jul 2013 09:42:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jul 2013 09:42:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sergehendrickx@gmail.com designates 209.85.220.173 as permitted sender) Received: from [209.85.220.173] (HELO mail-vc0-f173.google.com) (209.85.220.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jul 2013 09:42:26 +0000 Received: by mail-vc0-f173.google.com with SMTP id ht10so3029933vcb.18 for ; Fri, 19 Jul 2013 02:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cwsF92K8wJ38xlzjJeMimNf0lW66JXu06c8loJWZpf0=; b=NVrLWj9boNMor5QixVpW6g24+ZDAXlBujfsq8SgogdDSxFKDRtuElwsYSy2jUb0cN2 tEyL+6PcLXFTYj3URKV3ixferjBYZ7i74hLG0Fd0fB/c9PeP7cSp2MGZTrcPXygHTtPj vlSS32Tjw64b1Igp//vM1Ql8Zf+m24yQh2kojecr8PtpYX19gPdtM7gY38NMS2ZOk7H0 tEUiRgdDjgqnsuWNRMCzBhUzh5rYbjiBU8+UfRgETNq8kCywpcVZHBUwJ+jDaaK/PP/S 9RAvi2q6A4wKgH6WGvXt5s+buJNTAUS9jd85COISMBmdgiz+WhWQpEncs9PLMxOlxCCM tP5g== MIME-Version: 1.0 X-Received: by 10.221.63.136 with SMTP id xe8mr5446472vcb.51.1374226925207; Fri, 19 Jul 2013 02:42:05 -0700 (PDT) Received: by 10.58.97.171 with HTTP; Fri, 19 Jul 2013 02:42:05 -0700 (PDT) In-Reply-To: <5438921214190319781@unknownmsgid> References: <5438921214190319781@unknownmsgid> Date: Fri, 19 Jul 2013 11:42:05 +0200 Message-ID: Subject: Re: Elasticsearch 0.90.2 From: Serge Hendrickx To: Karl Wright Cc: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary=001a1133414cb7c75304e1da2372 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133414cb7c75304e1da2372 Content-Type: text/plain; charset=ISO-8859-1 Hi Karl, It was indeed a problem with the exclusion of documents. When searching for a solution I came across a bug report describing lack of indexation of URLs that have no extension. ( https://issues.apache.org/jira/browse/CONNECTORS-707 ) This was why my implemenation wouldn't index the documents. Thank you for your help! Serge On Thu, Jul 18, 2013 at 9:13 PM, Karl Wright wrote: > Hi Serge, > There may be two reasons that you aren't getting any documents. The first > reason may be because the feed itself is nfetchable or prohibited by > robots.txt. The second possibility is that you es connection extensions > and mime types exclude the documents. > > First, you can try creating a test job that outputs to the null output > connector and see if you get anything interesting in the simple history > when the job is run. If not, turn on connector debugging in > properties.xml. Httpclient debugging is not much use here. > > Karl > > Sent from my Windows Phone > ------------------------------ > From: Serge Hendrickx > Sent: 7/18/2013 11:14 AM > To: user@manifoldcf.apache.org > Subject: Elasticsearch 0.90.2 > > Hello, > I'm trying to run ManifoldCF 1.2 with elasticsearch 0.90.2 through an RSS > Repository connection. > In the "simple history report" from the end-user manual there is an > "Indexation (Elasticsearch)" Activity. ( http://manifoldcf.apache.org/ > release/trunk/en_US/images/en_US/elasticsearch-history-report.png ) > In my implementation, it skips this stage and goes directly to job stop. > (job start -> fetch -> job end -> Optimize (Elasticsearch)) > There is no change in my Elasticsearch index after this job has run. > What could be the cause of this problem? > Here are the log lines that may be relevant (from the end of fetch through > to optimize): > > DEBUG 2013-07-18 10:26:51,578 (Thread-725) - Connection [id: 2][route: > {s}->http://feeds.nieuwsblad.be] can be kept alive for 15000 MILLISECONDS > DEBUG 2013-07-18 10:26:51,578 (Thread-725) - Connection released: [id: > 2][route: {s}->http://feeds.nieuwsblad.be][total kept alive: 1; route > allocated: 1 of 2; total allocated: 1 of 1] > DEBUG 2013-07-18 10:26:51,617 (Worker thread '1') - Connection manager is > shutting down > DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection > 0.0.0.0:57166<->134.58.64.12:443 closed > DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection > 0.0.0.0:57166<->134.58.64.12:443 closed > DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection manager > shut down > DEBUG 2013-07-18 10:27:10,236 (Thread-897) - Connection request: [route: > {}->http://localhost:9200][total kept alive: 0; route allocated: 0 of 2; > total allocated: 0 of 1] > DEBUG 2013-07-18 10:27:10,236 (Thread-897) - Connection leased: [id: > 3][route: {}->http://localhost:9200][total kept alive: 0; route > allocated: 1 of 2; total allocated: 1 of 1] > DEBUG 2013-07-18 10:27:10,237 (Thread-897) - Connecting to localhost:9200 > DEBUG 2013-07-18 10:27:10,246 (Thread-897) - CookieSpec selected: > best-match > DEBUG 2013-07-18 10:27:10,246 (Thread-897) - Auth cache not set in the > context > DEBUG 2013-07-18 10:27:10,246 (Thread-897) - Target auth state: > UNCHALLENGED > DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Proxy auth state: UNCHALLENGED > DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Attempt 1 to execute request > DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Sending request: GET > /index/_optimize HTTP/1.1 > > Thank you in advance! > Serge Hendrickx > --001a1133414cb7c75304e1da2372 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Karl,
=A0
It was indeed a prob= lem with the exclusion of documents.
When searching for a solutio= n I came across a bug report describing lack of indexation of URLs that hav= e no extension.
This was = why my implemenation wouldn't index the documents.
=A0
<= div> Thank you for your help!
=A0
Serge


On Thu, Jul 18, 2013 a= t 9:13 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Serge,
There may be two reasons that you a= ren't getting any documents.=A0 The first reason may be because the fee= d itself is nfetchable or prohibited by robots.txt.=A0 The second possibili= ty is that you es connection extensions and mime types exclude the document= s.

First, you can try creating a test job that outputs to the null output = connector and see if you get anything interesting in the simple history whe= n the job is run.=A0 If not, turn on connector debugging in properties.xml.= =A0 Httpclient debugging is not much use here.

Karl

Sent from my Windows Phone

From: Serge Hen= drickx
Sent: 7/18/2013 11:14 AM
To: user@manifoldcf.apache.org
Subject: Elasticsearch 0.90.2

Hello,
I'm trying to run ManifoldCF 1.2 with el= asticsearch 0.90.2 through an RSS Repository connection.
In the "= simple history report" from the end-user manual there is an "Inde= xation (Elasticsearch)" Activity. ( http://manifoldcf.apache.org/release/trunk/en_US/images/en_US/elasticsearch-history-report= .png )
In my implementation, it skips this stage and goes directly to job stop. (job start -> fetch -> job end -> Optimize (Elasticsearch))
There is no change in my Elasticsearch index after this job has run= .
What could be the cause of this problem?
Here are the log lin= es that may be relevant (from the end of fetch through to optimize):

DEBUG 2013-07-18 10:26:51,578 (Thread-725) - Connection [id: 2][route= : {s}->http://f= eeds.nieuwsblad.be] can be kept alive for 15000 MILLISECONDS
DEBUG 2013-07-18 10:26:51,578 (Thread-725) - Connection released: [id: 2][r= oute: {s}->http= ://feeds.nieuwsblad.be][total kept alive: 1; route allocated: 1 of 2; t= otal allocated: 1 of 1]
DEBUG 2013-07-18 10:26:51,617 (Worker thread '1') - Connection mana= ger is shutting down
DEBUG 2013-07-18 10:26:51,619 (Worker thread '1= ') - Connection 0.0.= 0.0:57166<->134.58.64.12:443 closed
DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection 0.0.0.0:57166<->134.58.64.12:443 cl= osed
DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection mana= ger shut down
DEBUG 2013-07-18 10:27:10,236 (Thread-897) - Connection request: [route: {}= ->http://localhost:9200][total kept alive: 0; route allocated: 0 of 2; t= otal allocated: 0 of 1]
DEBUG 2013-07-18 10:27:10,236 (Thread-897) - Con= nection leased: [id: 3][route: {}->http://localhost:9200][total kept ali= ve: 0; route allocated: 1 of 2; total allocated: 1 of 1]
DEBUG 2013-07-18 10:27:10,237 (Thread-897) - Connecting to localhost:9200DEBUG 2013-07-18 10:27:10,246 (Thread-897) - CookieSpec selected: best-ma= tch
DEBUG 2013-07-18 10:27:10,246 (Thread-897) - Auth cache not set in t= he context
DEBUG 2013-07-18 10:27:10,246 (Thread-897) - Target auth state: UNCHALLENGE= D
DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Proxy auth state: UNCHALL= ENGED
DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Attempt 1 to execute = request
DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Sending request: GET /index/_o= ptimize HTTP/1.1

Thank you in advance!
Serge Hendrickx

--001a1133414cb7c75304e1da2372--