nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Michel Tremblay <jm.tre...@gmail.com>
Subject Nutch 1.7 and ElasticSearch: content not sent to ElasticSearch
Date Mon, 09 Sep 2013 22:02:58 GMT
Hi,

I'm using Nutch 1.7 and ElasticSearch (installed on the same CentOS machine). 

I could run a quick test using the "bin/nutch crawl …" command on nutch.apache.org (using
domain-urlfilter). I can run the ElasticSearch indexer successfully, but all entries in ElasticSearch
only have "segment", "digest", and "boost" fields in the "_source" object.  I would expect
to see "content" as well, right?

I know some content was parsed when running "bin/nutch readseg -get <seg> <url>.

I see that schema.xml is used as mapping for Solr.  I think ElasticSearch doesn't need any
pre-defined mapping, right?

Logs under $NUTCH_HOME/logs are not helping (no error, 1 warning from NativeCodeLoader).

Am I missing some config somewhere?

(Yes, I'm new to Nutch and ElasticSearch.)

JM
Mime
View raw message