lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <ishalymi...@yandex-team.ru>
Subject Re: Sending a document to IndexWriter field by field
Date Thu, 20 Feb 2014 21:02:40 GMT
Mike, thank you!

So eventually this amount of data must stay entirely in RAM (as postings) before flushing
to disk?
Can it be hacked?)

The documents themselves (that I will deliver to user) are of a regular size, but features
that I generate grow combinatorially in size and blow the index up in some sense.
I definitely want to think about breaking them into pieces, thank you for the advice!
 

--
Best Regards,
Igor Shalyminov


21.02.2014, 00:50, "Michael McCandless" <lucene@mikemccandless.com>:
> Yes, in 4.x IndexWriter now takes an Iterable that enumerates the
> fields one at a time.
>
> You can also pass a Reader to a Field.
>
> That said, there will still be massive RAM required by IW to hold the
> inverted postings for that one document, likely much more RAM than the
> original document's String contents.
>
> And, such huge documents are rarely useful in practice.  E.g., how
> will you "deliver" that hit to the end user at search time?  Will
> scores actually make sense for such enormous documents?  It's better
> to break them up into more manageable sizes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov
> <ishalyminov@yandex-team.ru> wrote:
>
>>  Hello!
>>
>>  I'va faced a problem of indexing huge documents. The indexing itself goes allright,
but when the document processing becomes concurrent, OutOfMemories start appearing (even with
heap of about 32GB).
>>  The issue, as I see it, is that I have to create a Document instance to send it
to IndexWriter, and Document is just a collection of all the fields, all in RAM.
>>  With my huge fields, it would be so much better to have the ability of sending
document fields for writing one by one, keeping no more than a single field in RAM.
>>  Is it possible in the latest Lucene?
>>
>>  --
>>  Best Regards,
>>  Igor Shalyminov
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message