lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Grilheres <bgrilhe...@wanadoo.fr>
Subject Re: Lucene or Nutch ?
Date Wed, 05 Apr 2006 15:22:30 GMT
Thanks for your answer, I was not aware of the SOLR project,

There was a big typo here, I meant less than 10 Go of PDF files per day 
during one month => i.e. less than 300 Go of PDF files.
I made some tests with PDF files, 100Mo or Native PDF are converted to 
3Mo of index in lucene [The text was indexed but not stored].

Bruno

Yonik Seeley wrote:
> On 4/5/06, Bruno Grilheres <bgrilheres@wanadoo.fr> wrote:
>   
>> 1) High volume of data indexation but only with add and delete
>> functionality (approximatively 10 PDF) => scalable architecture HDFS
>> seems good.
>> 2) Specific analysis chain and a given set of meta-data indexation.
>> 3) Language Recognition
>> 4) No graphical interface for searching is needed, no crawling is
>> needed, Indexation and Search are performed with HTTP Request to a Servlet
>>
>> What is the best starting choice for this : Lucene or Nutch ?
>>
>> As far as I know Lucene is a good choice for 2 and 4, Nutch is a better
>> choice for 1 and 3.
>>     
>
> Solr would also be good for 2 and 4
> As far as 1, what type of scalability requirements are we talking? (#
> documents, size of docs, etc)
>
> -Yonik
> http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>   


	

	
		
___________________________________________________________________________ 
Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels
pour appeler la France et l'international.
Téléchargez sur http://fr.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message