lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Murzaku <murz...@yahoo.com>
Subject Re: Abusing lucene and way too many files open
Date Sun, 15 Sep 2002 13:18:56 GMT
> So after roughly crunching some numbers, with a mergeFactor set to
> the default of 10:
> 
> (7 files per segment + 12 for fields) * (up to 10 segments) * 100
> accounts * 10 langauges
> 
> = 200,000 open files at once. Ouch.

I have read accounts of people in this list dealing with data sets
larger than what you describe. There is one thing that you said you
were considering: why not include both account and language in the
record. Unless planning to distribute them over several machines, I
wouldn't create artificially 100*10 indices. In you case, when
indexing, I would have instead 14 fields * 7 files * 10 segments open
files... I guess you already have resolved the problem of identifying
the language and you have built the corresponding analyzers which you
call when indexing or querying in a given language. Just a thought.


=====
__________________________________
alex@lissus.com -- http://www.lissus.com

__________________________________________________
Do you Yahoo!?
Yahoo! News - Today's headlines
http://news.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message