lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Gearon <gear...@sbcglobal.net>
Subject Re: filtering or getting accurate crawling results
Date Sat, 13 Nov 2010 04:52:13 GMT
Actually, can Nutch be used for SCRAPING, not crawling?

I don't just want the url, I want the data assigned to specific fields, no 
matter what site or format it is coming from.

I've done scraping, but it had to be custom tailored for  each target.



 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Dennis Gearon <gearond@sbcglobal.net>
To: solr-user@lucene.apache.org
Sent: Fri, November 12, 2010 8:46:31 PM
Subject: filtering or getting accurate crawling results

How easy is it to get good results from the Lucene crawling software?

Let's say for example I wanted only information about a general subject, but 
nothing else? (Sorry, not ready to say what exactly at this point) Is it like 
tuning Solr, or IS it tuning Solr to just not accept what does not fit the 
desire results?

The amount of information that I'd want is LARGE, but a drop in the bucket 
compared to google itself.



Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Mime
View raw message