lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: lucene suiteable ? 6 mio recods / day 1k
Date Mon, 22 Dec 2008 03:41:36 GMT
Christian,

You can certainly purge old documents on a daily basis in order to keep the corpus from growing,
but note that 3M*90=270M 2K docs may be a bit too much for a single index unless you really
have lots of RAM or you don't need queries to be quick.  In other words, you may have to spread
this over multiple indices/machines.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Christian Brennsteiner <eingfoan@yahoo.de>
> To: java-user@lucene.apache.org
> Sent: Friday, December 19, 2008 6:22:40 AM
> Subject: lucene suiteable ? 6 mio recods / day 1k
> 
> hi *,
> 
> i am searching for a fulltext index capeable of the following requirements:
> 
> index everyday 3 000 000 new records with a validity of N days (e.g.
> 90 days expiration)
> == 34,7 / s
> one record is e.g. an url and can be up to 2 k big
> 
> http://example.com/somedir/some.html
> 
> lucene should use "/" as a word seperator and should e.g. eliminate all ":"
> 
> so the following "sentence" shoule be indexed:
> 
> http example.com somedir some.html when having the url
> http://example.com/somedir/some.html
> 
> my main concern about this requirement is that the index should not
> grow over time as it always holds
> NR OF DAYS * RECORDS PER DAY  and expires the records after a given
> time. in my opinione ther must be some background thread always
> throwing away expired hits.
> 
> is this easilly possible with lucene?
> 
> regards chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message