lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Brennsteiner" <christ...@brennsteiner.at>
Subject Re: lucene suiteable ? 6 mio recods / day 1k
Date Mon, 22 Dec 2008 07:55:01 GMT
hi otis,

i think that out of 2 k 80 % can be stemmed and many of the words are
duplicates so they would not need full space.
can you give me an idea what in your opinion would mean  "don't need
queries to be quick" ...
i have no idea in what timeframe it could be handeled if it is not
completely in RAM.

regards chris



On Mon, Dec 22, 2008 at 4:41 AM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Christian
>
> You can certainly purge old documents on a daily basis in order to keep the corpus from
growing, but note that 3M*90=270M 2K docs may be a bit too much for a single index unless
you really have lots of RAM or you don't need queries to be quick.  In other words, you may
have to spread this over multiple indices/machines.
>
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Christian Brennsteiner <eingfoan@yahoo.de>
>> To: java-user@lucene.apache.org
>> Sent: Friday, December 19, 2008 6:22:40 AM
>> Subject: lucene suiteable ? 6 mio recods / day 1k
>>
>> hi *,
>>
>> i am searching for a fulltext index capeable of the following requirements:
>>
>> index everyday 3 000 000 new records with a validity of N days (e.g.
>> 90 days expiration)
>> == 34,7 / s
>> one record is e.g. an url and can be up to 2 k big
>>
>> http://example.com/somedir/some.html
>>
>> lucene should use "/" as a word seperator and should e.g. eliminate all ":"
>>
>> so the following "sentence" shoule be indexed:
>>
>> http example.com somedir some.html when having the url
>> http://example.com/somedir/some.html
>>
>> my main concern about this requirement is that the index should not
>> grow over time as it always holds
>> NR OF DAYS * RECORDS PER DAY  and expires the records after a given
>> time. in my opinione ther must be some background thread always
>> throwing away expired hits.
>>
>> is this easilly possible with lucene?
>>
>> regards chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
---------------
Christian Brennsteiner
Linzergasse 21 / 14
5020 Salzburg
Austria / Europe

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message