lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Indexing the spider content
Date Wed, 25 Jun 2008 10:54:17 GMT
If it has an API that let's you get the content that needs to be  
indexed, then, sure, you can index from the spider.  If it doesn't  
have an API, presumably, you would need to somehow extract the docs  
from the files it builds.  This is, of course, assuming it stores the  
crawled files in some proprietary format.  If it just downloads them  
as the original files on your disk, then it is just like any indexing  
problem of extracting the content from the files.

-Grant

On Jun 24, 2008, at 11:35 PM, yugana wrote:

>
> Hi Otis,
>
> Thanks for the reply. So you mean it is not possible to use Lucene  
> to index
> the fetched (Verity Spider Content) content.
>
> Yug
>
>
> Otis Gospodnetic wrote:
>>
>> It sounds like you want to check out Nutch - fetched, indexer,  
>> searcher,
>> etc. in one lovely package.
>>
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>> ----- Original Message ----
>>> From: yugana <yugandhar.adem@wipro.com>
>>> To: java-user@lucene.apache.org
>>> Sent: Tuesday, June 24, 2008 3:25:03 AM
>>> Subject: Indexing the spider content
>>>
>>>
>>> Hi All,
>>>
>>> I am new to Lucene Search. Can you let me know if it is possible  
>>> to index
>>> the "Verity Spider" content. If possible please let me know how to  
>>> create
>>> a
>>> index form it and search on it. Also share some code snippets on  
>>> how to
>>> proceed on indexing and searching. I donot have much time to go  
>>> throught
>>> the
>>> Lucene API. Your help is appreciated.
>>>
>>> Thanks,
>>> Yug
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Indexing-the-spider-content-tp18085388p18085388.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Indexing-the-spider-content-tp18085388p18104348.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message