lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: lucene suiteable ? 6 mio recods / day 1k
Date Mon, 22 Dec 2008 18:12:25 GMT
Hi Christian,

Typically for public facing applications the desire is to have search results be sub-second.
 For some applications waiting even a minute or more is OK.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Christian Brennsteiner <christian@brennsteiner.at>
> To: java-user@lucene.apache.org
> Sent: Monday, December 22, 2008 2:55:01 AM
> Subject: Re: lucene suiteable ? 6 mio recods / day 1k
> 
> hi otis,
> 
> i think that out of 2 k 80 % can be stemmed and many of the words are
> duplicates so they would not need full space.
> can you give me an idea what in your opinion would mean  "don't need
> queries to be quick" ...
> i have no idea in what timeframe it could be handeled if it is not
> completely in RAM.
> 
> regards chris
> 
> 
> 
> On Mon, Dec 22, 2008 at 4:41 AM, Otis Gospodnetic
> wrote:
> > Christian
> >
> > You can certainly purge old documents on a daily basis in order to keep the 
> corpus from growing, but note that 3M*90=270M 2K docs may be a bit too much for 
> a single index unless you really have lots of RAM or you don't need queries to 
> be quick.  In other words, you may have to spread this over multiple 
> indices/machines.
> >
> >
> > Otis --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Christian Brennsteiner 
> >> To: java-user@lucene.apache.org
> >> Sent: Friday, December 19, 2008 6:22:40 AM
> >> Subject: lucene suiteable ? 6 mio recods / day 1k
> >>
> >> hi *,
> >>
> >> i am searching for a fulltext index capeable of the following requirements:
> >>
> >> index everyday 3 000 000 new records with a validity of N days (e.g.
> >> 90 days expiration)
> >> == 34,7 / s
> >> one record is e.g. an url and can be up to 2 k big
> >>
> >> http://example.com/somedir/some.html
> >>
> >> lucene should use "/" as a word seperator and should e.g. eliminate all ":"
> >>
> >> so the following "sentence" shoule be indexed:
> >>
> >> http example.com somedir some.html when having the url
> >> http://example.com/somedir/some.html
> >>
> >> my main concern about this requirement is that the index should not
> >> grow over time as it always holds
> >> NR OF DAYS * RECORDS PER DAY  and expires the records after a given
> >> time. in my opinione ther must be some background thread always
> >> throwing away expired hits.
> >>
> >> is this easilly possible with lucene?
> >>
> >> regards chris
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> 
> -- 
> ---------------
> Christian Brennsteiner
> Linzergasse 21 / 14
> 5020 Salzburg
> Austria / Europe
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message