lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <>
Subject Re: Sorting problem in Solr due to Lucene Field Cache
Date Fri, 16 May 2014 21:49:51 GMT
Take a look at Solr's use of DocValues:

There are docValues options that use less memory then the FieldCache.

Joel Bernstein
Search Engineer at Heliosearch

On Thu, May 15, 2014 at 6:39 AM, Jeongseok Son <> wrote:

> Hello, I'm struggling with large data indexed and searched by Solr.
> The schema of the documents consist of date(YYYY-MM-DD), text(tokenized and
> indexed with Natural Language Toolkit), and several numerical fields.
> Each document is small-sized but but the number of the docs is very large,
> which is around 10 million per each date. The server has 32GB of memory and
> I allocated around 30GB for Solr JVM.
> My Solr server has to return documents sorted by one of the numerical
> fields when is requested with specific date and text.(ex.
> q=date:YYYY-MM-DD+text:KEYWORD) The problem is that sorting in Lucene
> requires lots of Field Cache and Solr can't handle Field Cache well. The
> Field Cache is getting larger as more queries are executed and is not
> evicted. When the whole memory is filled with Field Cache, Solr server
> stops or generates Out of Memory exception.
> Solr cannot control Lucene field cache at all so I have a difficult time to
> solve this problem. I'm considering these three ways to solve this.
> 1) Add more memory.
> This can relieve the problem but I don't think it can completely solve it.
> Anyway the memory would fill up with field cache as the server handles
> search requests.
> 2) Separate numerical data from text data
> I find Solr/Lucene isn't suitable for sorting large numerical data.
> Therefore I'm thinking of storing numerical data in another DB(HBase,
> MongoDB ...), then Solr server will just do some text search.
> 3) Switching to Elasticsearch
> According to this page(
> )
> Elasticsearch can control field cache. I think ES could solve my
> problem.
> I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you
> have any better ideas please let me know. I've went through too many
> troubles so it's time to make a decision. I want my choices reviewed by
> many other excellent Solr users and developers and also want to find better
> solutions.
> I really appreciate any help you can provide.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message