lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Sorting problem in Solr due to Lucene Field Cache
Date Fri, 16 May 2014 21:49:51 GMT
Take a look at Solr's use of DocValues:
https://cwiki.apache.org/confluence/display/solr/DocValues.

There are docValues options that use less memory then the FieldCache.

Joel Bernstein
Search Engineer at Heliosearch


On Thu, May 15, 2014 at 6:39 AM, Jeongseok Son <invictusjs@gmail.com> wrote:

> Hello, I'm struggling with large data indexed and searched by Solr.
>
> The schema of the documents consist of date(YYYY-MM-DD), text(tokenized and
> indexed with Natural Language Toolkit), and several numerical fields.
>
> Each document is small-sized but but the number of the docs is very large,
> which is around 10 million per each date. The server has 32GB of memory and
> I allocated around 30GB for Solr JVM.
>
> My Solr server has to return documents sorted by one of the numerical
> fields when is requested with specific date and text.(ex.
> q=date:YYYY-MM-DD+text:KEYWORD) The problem is that sorting in Lucene
> requires lots of Field Cache and Solr can't handle Field Cache well. The
> Field Cache is getting larger as more queries are executed and is not
> evicted. When the whole memory is filled with Field Cache, Solr server
> stops or generates Out of Memory exception.
>
> Solr cannot control Lucene field cache at all so I have a difficult time to
> solve this problem. I'm considering these three ways to solve this.
>
> 1) Add more memory.
> This can relieve the problem but I don't think it can completely solve it.
> Anyway the memory would fill up with field cache as the server handles
> search requests.
> 2) Separate numerical data from text data
> I find Solr/Lucene isn't suitable for sorting large numerical data.
> Therefore I'm thinking of storing numerical data in another DB(HBase,
> MongoDB ...), then Solr server will just do some text search.
> 3) Switching to Elasticsearch
> According to this page(
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
> )
> Elasticsearch can control field cache. I think ES could solve my
> problem.
>
> I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you
> have any better ideas please let me know. I've went through too many
> troubles so it's time to make a decision. I want my choices reviewed by
> many other excellent Solr users and developers and also want to find better
> solutions.
> I really appreciate any help you can provide.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message