lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: Pushing content to Solr from Nutch
Date Fri, 11 Apr 2014 07:34:21 GMT
Hi Xavier;

I think that it is better to ask this question at Nutch user list.

Thanks;
Furkan KAMACI


2014-04-11 7:52 GMT+03:00 Jack Krupansky <jack@basetechnology.com>:

> Does your Solr schema match the data output by nutch? It's up to you to
> create a Solr schema that matches the output of nutch - read up on the
> nutch doc for that info. Solr doesn't define that info, nutch does.
>
> -- Jack Krupansky
>
> From: Xavier Morera
> Sent: Thursday, April 10, 2014 12:58 PM
> To: solr-user@lucene.apache.org
> Subject: Pushing content to Solr from Nutch
>
> Hi,
>
> I have followed several Nutch tutorials - including the main one
> http://wiki.apache.org/nutch/NutchTutorial - to crawl sites (which works,
> I can see in the console as the pages get crawled and the directories built
> with the data) but for the life of me I can't get anything posted to Solr.
> The Solr console doesn't even squint, therefore Nutch is not sending
> anything.
>
> This is the command that I send over that crawls and in theory should also
> post
> bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2
>
>
> But I found that I could also use this one when it is already crawled
> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb
> crawl/segments/*
>
>
> But no luck.
>
> This is the only thing that called my attention but I read that by adding
> the property below would work but doesn't work.
> No IndexWriters activated - check your configuration
>
>
> This is the property
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> </property>
>
> Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows.
>
> --
>
> Xavier Morera
> email: xavier@familiamorera.com
>
> CR: +(506) 8849 8866
> US: +1 (305) 600 4919
> skype: xmorera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message