nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From smooth almonds <sir.ramsel.ja...@gmail.com>
Subject Returning web page abstract with Solr
Date Wed, 04 Apr 2012 07:30:40 GMT
I've crawled flickr.com with Nutch successfully and am trying to return a
highlighted abstract using Solr as the indexer/searcher. So, if I query
"ocean" then I want to return a 20-30 word abstract from just the text of
the web page (not the title or url) containing that query term.

I've copied the Nutch schema.xml as my Solr schema.xml.


Is the 'content' field in the schema.xml the field that indexes/stores the
body of a web page? Or is there another field? 

And how do I return this field? Do I have to turn storing on? Or is there
another way I can have Solr retrieve the abstract from the web at search
time so that I don't have to store all that data?


I can't find anything regarding this on the web and it seems like it would
be a pretty popular topic.



--
View this message in context: http://lucene.472066.n3.nabble.com/Returning-web-page-abstract-with-Solr-tp3883400p3883400.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Mime
View raw message