manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serge Hendrickx <sergehendri...@gmail.com>
Subject Re: Elasticsearch 0.90.2
Date Fri, 19 Jul 2013 09:42:05 GMT
Hi Karl,

It was indeed a problem with the exclusion of documents.
When searching for a solution I came across a bug report describing lack of
indexation of URLs that have no extension.
( https://issues.apache.org/jira/browse/CONNECTORS-707 )
This was why my implemenation wouldn't index the documents.

Thank you for your help!

Serge


On Thu, Jul 18, 2013 at 9:13 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Serge,
> There may be two reasons that you aren't getting any documents.  The first
> reason may be because the feed itself is nfetchable or prohibited by
> robots.txt.  The second possibility is that you es connection extensions
> and mime types exclude the documents.
>
> First, you can try creating a test job that outputs to the null output
> connector and see if you get anything interesting in the simple history
> when the job is run.  If not, turn on connector debugging in
> properties.xml.  Httpclient debugging is not much use here.
>
> Karl
>
> Sent from my Windows Phone
> ------------------------------
> From: Serge Hendrickx
> Sent: 7/18/2013 11:14 AM
> To: user@manifoldcf.apache.org
> Subject: Elasticsearch 0.90.2
>
> Hello,
> I'm trying to run ManifoldCF 1.2 with elasticsearch 0.90.2 through an RSS
> Repository connection.
> In the "simple history report" from the end-user manual there is an
> "Indexation (Elasticsearch)" Activity. ( http://manifoldcf.apache.org/
> release/trunk/en_US/images/en_US/elasticsearch-history-report.png )
> In my implementation, it skips this stage and goes directly to job stop.
> (job start -> fetch -> job end -> Optimize (Elasticsearch))
> There is no change in my Elasticsearch index after this job has run.
> What could be the cause of this problem?
> Here are the log lines that may be relevant (from the end of fetch through
> to optimize):
>
> DEBUG 2013-07-18 10:26:51,578 (Thread-725) - Connection [id: 2][route:
> {s}->http://feeds.nieuwsblad.be] can be kept alive for 15000 MILLISECONDS
> DEBUG 2013-07-18 10:26:51,578 (Thread-725) - Connection released: [id:
> 2][route: {s}->http://feeds.nieuwsblad.be][total kept alive: 1; route
> allocated: 1 of 2; total allocated: 1 of 1]
> DEBUG 2013-07-18 10:26:51,617 (Worker thread '1') - Connection manager is
> shutting down
> DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection
> 0.0.0.0:57166<->134.58.64.12:443 closed
> DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection
> 0.0.0.0:57166<->134.58.64.12:443 closed
> DEBUG 2013-07-18 10:26:51,619 (Worker thread '1') - Connection manager
> shut down
> DEBUG 2013-07-18 10:27:10,236 (Thread-897) - Connection request: [route:
> {}->http://localhost:9200][total kept alive: 0; route allocated: 0 of 2;
> total allocated: 0 of 1]
> DEBUG 2013-07-18 10:27:10,236 (Thread-897) - Connection leased: [id:
> 3][route: {}->http://localhost:9200][total kept alive: 0; route
> allocated: 1 of 2; total allocated: 1 of 1]
> DEBUG 2013-07-18 10:27:10,237 (Thread-897) - Connecting to localhost:9200
> DEBUG 2013-07-18 10:27:10,246 (Thread-897) - CookieSpec selected:
> best-match
> DEBUG 2013-07-18 10:27:10,246 (Thread-897) - Auth cache not set in the
> context
> DEBUG 2013-07-18 10:27:10,246 (Thread-897) - Target auth state:
> UNCHALLENGED
> DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Proxy auth state: UNCHALLENGED
> DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Attempt 1 to execute request
> DEBUG 2013-07-18 10:27:10,247 (Thread-897) - Sending request: GET
> /index/_optimize HTTP/1.1
>
> Thank you in advance!
> Serge Hendrickx
>

Mime
View raw message