lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Large fields storage
Date Thu, 04 Dec 2014 14:23:37 GMT
On 12/1/2014 3:10 PM, Avishai Ish-Shalom wrote:
> I have very large documents (as big as 1GB) which i'm indexing and planning
> to store in Solr in order to use highlighting snippets. I am concerned
> about possible performance issues with such large fields - does storing the
> fields require additional RAM over what is required to index/fetch/search?
> I'm assuming Solr reads only the required data by offset from the storage
> and not the entire field. Am I correct in this assumption?
>
> Does anyone on this list has experience to share with such large documents?

You've gotten some excellent replies already, I just wanted to mention
compression.

Short answer to the question about RAM: You might need a fair amount of
extra memory for the Java heap.  Because of the potential for a large
index size, you'll want a large amount of memory beyond the heap, for
caching.

More detailed info:

The response that gets built to send to the user, if the fl parameter
contains the field with that large data in it, will require memory to
hold that data, up to the number of records in the "rows" parameter on
the query.  If it's a distributed index, some of that data might cross
the network twice -- once from the server that stores it, and again to
the client.

In Solr 4.1 and later, stored fields are compressed, with no way to turn
compression off.  With very large stored fields, there may be
performance and memory implications for both indexing (compression) and
queries (decompression). Termvectors (which Michael Sokolov mentioned in
his reply) have been compressed since version 4.2.

More memory will probably be required for "ramBufferSizeMB" -- a
temporary storage area in RAM used during indexing.  That defaults to
100MB in recent Solr versions.  This is normally enough for several
hundred or several thousand typical documents, but just one of your
documents may not fit.  This will increase your heap requirements.

As for whether there is a way to only retrieve specific data from the
compressed information without uncompressing all of it, that I do not
know.  The compression is handled by the Lucene layer, not Solr itself.

https://issues.apache.org/jira/browse/LUCENE-4226

Thanks,
Shawn


Mime
View raw message