lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject Re: Indexing and Duplication
Date Wed, 20 Mar 2002 20:20:15 GMT
Kelvin,
<snip>

>
>This seems like a silly question, but will keeping hold of Document objects
>cause me to run into "Too many files open" problems? If each document object

No, unless you don't close the evt. files you read the doc fields from.
It depends on how you obtain your document fields.

>has a Field.Text which contains a Reader, and the Reader isn't closed till
>the document is indexed, would this be an issue? Is the memory consumed by

I have not used Readers yet, so I don't know.

>Document objects directly proportional to the size of the object the Reader
>reads?

I think/hope the point of using a Reader is to avoid reading the whole document
into some buffer, so the add() method of the index writer only needs to
tokenize the stream from the Reader.

As for memory usage during indexing:
I have indexed docs with around 100,000 terms in a single String
passed to Field(), and with the max nr. of terms per field set to ten million.
The JVM starts taking more memory occasionaly, but I have not seen it
use more than 17Mb yet (-verbose option to java).

I'd suggest to reconsider the use of a Hashtable to communicate
between threads. I know a Hashtable is thread safe, but some form of queue
is more like the thing one would expect there. Also, with a bounded queue
a limit on memory usage is easily enforced because the feeding thread
will wait as long as needed. For more about queues:
http://g.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html
The faq entry there about producer and consumer threads convinced me
to use bounded queues after I got some out of memory crashes...

Have fun,
Ype

-- 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message