lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Pushing content to Solr from Nutch
Date Fri, 11 Apr 2014 04:52:26 GMT
Does your Solr schema match the data output by nutch? It’s up to you to create a Solr schema
that matches the output of nutch – read up on the nutch doc for that info. Solr doesn’t
define that info, nutch does.

-- Jack Krupansky

From: Xavier Morera 
Sent: Thursday, April 10, 2014 12:58 PM
To: solr-user@lucene.apache.org 
Subject: Pushing content to Solr from Nutch

Hi, 

I have followed several Nutch tutorials - including the main one http://wiki.apache.org/nutch/NutchTutorial
- to crawl sites (which works, I can see in the console as the pages get crawled and the directories
built with the data) but for the life of me I can't get anything posted to Solr. The Solr
console doesn't even squint, therefore Nutch is not sending anything.

This is the command that I send over that crawls and in theory should also post
bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2


But I found that I could also use this one when it is already crawled
bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb crawl/segments/*


But no luck.

This is the only thing that called my attention but I read that by adding the property below
would work but doesn't work.
No IndexWriters activated - check your configuration


This is the property
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows.

-- 

Xavier Morera
email: xavier@familiamorera.com

CR: +(506) 8849 8866
US: +1 (305) 600 4919 
skype: xmorera
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message