lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konrad Scherer <bcdh...@uottawa.ca>
Subject Index Optimization space requirements
Date Wed, 30 Oct 2002 21:57:03 GMT
Hello all,

I am using lucene 1.2 (Java 1.4 on Solaris 7) and the xml indexer to index 
~24000 small xml documents. The finished and optimized index uses around 
340 MB disk space. The documents are reindexed once a week and this has 
worked without any trouble for months. Recently the free space on the hard 
drive was down to 1.36 GB and the optimization crashed due to "no space 
left on device". Deleting the index directory freed up 1.36 GB.
Question 1) Is it normal for the optimization process to require this much 
extra space?
2) Did I miss an option somewhere to limit the space usage of the 
optimization process?
3) More philosophically, do I really need the optimization?

Also, in the archives I came across a message talking about an Ispell-based 
stemmer to which Doug Cutting replied
>äÍÉÔÒÉÊ ï×ÓÑÎËÏ wrote:
> > http://www.halyava.ru/do/org.apache.lucene.analysis.zip
>
>This looks great!  If I understand correctly, it can be used to quickly
>build stemmers for lots of languages.  For example, the following page
>lists the location of ispell dictionaries for over 30 languages!
>
>    http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html
>
>This page should probably be referenced from the documentation.

I have not found the code anywhere on the lucene site and the link to the 
code above does not work any more. Does someone have this code or could the 
original author please repost the code? I am using the french stemmer from 
snowball and it does some strange things, like stemming paris to par and 
not stemming many verbs properly. I would like to try a different stemmer 
to see whether it is more useable.

I would also like to take this opportunity to thank the lucene developers 
for their work.

Konrad Scherer


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message