lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: SolrPerformanceFactors wiki page says contradictory things...
Date Thu, 26 Aug 2010 16:47:56 GMT
On Thu, Aug 26, 2010 at 12:06 PM, Eric Pugh
<epugh@opensourceconnections.com> wrote:
> Under "Factors affecting memory usage" there is this text:
>
> When processing an "add" command for a document, the standard XML update handler has
two limitations:
>
>        • All of the document's fields must simultaneously fit into memory. (Technically,
it's actually the sum of min(<the actual field value's length>, maxFieldLength). As
such, adjusting maxFieldLength may be of some help.)
>                • (I'm assuming that fields are truncated to maxFieldLength
before being added to the relevant document object. If that's not true, then maxFieldLength
won't help here. --ChrisHarris)
>        • Each individual <field>...</field> tag in the input XML must
fit into memory, regardless of maxFieldLength.
>
>
> Bullet 1 contradicts bullet 2, at least, the way I read it.
>
> Looking at the tokenizer that applies the maxFieldLength cutoff, it is working with a
stream...  That implies that the first bullet is correct, and that the entire XML document
doesn't need to fit into memory.  Unless what we are trying to say is that to parse the incoming
XML document, the entire document must fit into memory?  After that, the tokenizer kicks
in and only the min(<the actual field value's length>, maxFieldLength) applies to each
field...?


I think your understanding is correct: maxFieldLength has little to do
with memory use per-se - it's the max number of tokens indexed for any
given field in a document.  Of course cutting down the maxFieldLength
will cut down on what lucene internally stores before flushing a
segment too... but I imagine that's going to be irrelevant to 99.9% of
our users.

Maybe this whole thing should be cut down to "All of the document's
fields must currently simultaneously fit into memory.", if it's even
worth mentioning it at all.  Can you clean this up Eric?

-Yonik
http://lucenerevolution.org   Lucene/Solr Conference, Boston Oct 7-8

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message