lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <>
Subject Re: Sending a document to IndexWriter field by field
Date Thu, 20 Feb 2014 21:02:40 GMT
Mike, thank you!

So eventually this amount of data must stay entirely in RAM (as postings) before flushing
to disk?
Can it be hacked?)

The documents themselves (that I will deliver to user) are of a regular size, but features
that I generate grow combinatorially in size and blow the index up in some sense.
I definitely want to think about breaking them into pieces, thank you for the advice!

Best Regards,
Igor Shalyminov

21.02.2014, 00:50, "Michael McCandless" <>:
> Yes, in 4.x IndexWriter now takes an Iterable that enumerates the
> fields one at a time.
> You can also pass a Reader to a Field.
> That said, there will still be massive RAM required by IW to hold the
> inverted postings for that one document, likely much more RAM than the
> original document's String contents.
> And, such huge documents are rarely useful in practice.  E.g., how
> will you "deliver" that hit to the end user at search time?  Will
> scores actually make sense for such enormous documents?  It's better
> to break them up into more manageable sizes.
> Mike McCandless
> On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov
> <> wrote:
>>  Hello!
>>  I'va faced a problem of indexing huge documents. The indexing itself goes allright,
but when the document processing becomes concurrent, OutOfMemories start appearing (even with
heap of about 32GB).
>>  The issue, as I see it, is that I have to create a Document instance to send it
to IndexWriter, and Document is just a collection of all the fields, all in RAM.
>>  With my huge fields, it would be so much better to have the ability of sending
document fields for writing one by one, keeping no more than a single field in RAM.
>>  Is it possible in the latest Lucene?
>>  --
>>  Best Regards,
>>  Igor Shalyminov
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail:
>>  For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message