lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pugh <>
Subject SolrPerformanceFactors wiki page says contradictory things...
Date Thu, 26 Aug 2010 16:06:18 GMT
Under "Factors affecting memory usage" there is this text:

When processing an "add" command for a document, the standard XML update handler has two limitations:

	• All of the document's fields must simultaneously fit into memory. (Technically, it's
actually the sum of min(<the actual field value's length>, maxFieldLength). As such,
adjusting maxFieldLength may be of some help.)
		• (I'm assuming that fields are truncated to maxFieldLength before being added to the
relevant document object. If that's not true, then maxFieldLength won't help here. --ChrisHarris)
	• Each individual <field>...</field> tag in the input XML must fit into memory,
regardless of maxFieldLength.

Bullet 1 contradicts bullet 2, at least, the way I read it.  

Looking at the tokenizer that applies the maxFieldLength cutoff, it is working with a stream...
 That implies that the first bullet is correct, and that the entire XML document doesn't need
to fit into memory.  Unless what we are trying to say is that to parse the incoming XML document,
the entire document must fit into memory?  After that, the tokenizer kicks in and only the
min(<the actual field value's length>, maxFieldLength) applies to each field...?


Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
Co-Author: Solr 1.4 Enterprise Search Server available from

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message