lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Bell <billnb...@gmail.com>
Subject Re: solr multicore vs sharding vs 1 big collection
Date Mon, 03 Aug 2015 20:53:34 GMT
Yeah a separate by month or year is good and can really help in this case.

Bill Bell
Sent from mobile


> On Aug 2, 2015, at 5:29 PM, Jay Potharaju <jspotharaju@gmail.com> wrote:
> 
> Shawn,
> Thanks for the feedback. I agree that increasing timeout might alleviate
> the timeout issue. The main problem with increasing timeout is the
> detrimental effect it will have on the user experience, therefore can't
> increase it.
> I have looked at the queries that threw errors, next time I try it
> everything seems to work fine. Not sure how to reproduce the error.
> My concern with increasing the memory to 32GB is what happens when the
> index size grows over the next few months.
> One of the other solutions I have been thinking about is to rebuild
> index(weekly) and create a new collection and use it. Are there any good
> references for doing that?
> Thanks
> Jay
> 
>> On Sun, Aug 2, 2015 at 10:19 AM, Shawn Heisey <apache@elyograg.org> wrote:
>> 
>>> On 8/2/2015 8:29 AM, Jay Potharaju wrote:
>>> The document contains around 30 fields and have stored set to true for
>>> almost 15 of them. And these stored fields are queried and updated all
>> the
>>> time. You will notice that the deleted documents is almost 30% of the
>>> docs.  And it has stayed around that percent and has not come down.
>>> I did try optimize but that was disruptive as it caused search errors.
>>> I have been playing with merge factor to see if that helps with deleted
>>> documents or not. It is currently set to 5.
>>> 
>>> The server has 24 GB of memory out of which memory consumption is around
>> 23
>>> GB normally and the jvm is set to 6 GB. And have noticed that the
>> available
>>> memory on the server goes to 100 MB at times during a day.
>>> All the updates are run through DIH.
>> 
>> Using all availble memory is completely normal operation for ANY
>> operating system.  If you hold up Windows as an example of one that
>> doesn't ... it lies to you about "available" memory.  All modern
>> operating systems will utilize memory that is not explicitly allocated
>> for the OS disk cache.
>> 
>> The disk cache will instantly give up any of the memory it is using for
>> programs that request it.  Linux doesn't try to hide the disk cache from
>> you, but older versions of Windows do.  In the newer versions of Windows
>> that have the Resource Monitor, you can go there to see the actual
>> memory usage including the cache.
>> 
>>> Every day at least once i see the following error, which result in search
>>> errors on the front end of the site.
>>> 
>>> ERROR org.apache.solr.servlet.SolrDispatchFilter -
>>> null:org.eclipse.jetty.io.EofException
>>> 
>>> From what I have read these are mainly due to timeout and my timeout is
>> set
>>> to 30 seconds and cant set it to a higher number. I was thinking maybe
>> due
>>> to high memory usage, sometimes it leads to bad performance/errors.
>> 
>> Although this error can be caused by timeouts, it has a specific
>> meaning.  It means that the client disconnected before Solr responded to
>> the request, so when Solr tried to respond (through jetty), it found a
>> closed TCP connection.
>> 
>> Client timeouts need to either be completely removed, or set to a value
>> much longer than any request will take.  Five minutes is a good starting
>> value.
>> 
>> If all your client timeout is set to 30 seconds and you are seeing
>> EofExceptions, that means that your requests are taking longer than 30
>> seconds, and you likely have some performance issues.  It's also
>> possible that some of your client timeouts are set a lot shorter than 30
>> seconds.
>> 
>>> My objective is to stop the errors, adding more memory to the server is
>> not
>>> a good scaling strategy. That is why i was thinking maybe there is a
>> issue
>>> with the way things are set up and need to be revisited.
>> 
>> You're right that adding more memory to the servers is not a good
>> scaling strategy for the general case ... but in this situation, I think
>> it might be prudent.  For your index and heap sizes, I would want the
>> company to pay for at least 32GB of RAM.
>> 
>> Having said that ... I've seen Solr installs work well with a LOT less
>> memory than the ideal.  I don't know that adding more memory is
>> necessary, unless your system (CPU, storage, and memory speeds) is
>> particularly slow.  Based on your document count and index size, your
>> documents are quite small, so I think your memory size is probably good
>> -- if the CPU, memory bus, and storage are very fast.  If one or more of
>> those subsystems aren't fast, then make up the difference with lots of
>> memory.
>> 
>> Some light reading, where you will learn why I think 32GB is an ideal
>> memory size for your system:
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> It is possible that your 6GB heap is not quite big enough for good
>> performance, or that your GC is not well-tuned.  These topics are also
>> discussed on that wiki page.  If you increase your heap size, then the
>> likelihood of needing more memory in the system becomes greater, because
>> there will be less memory available for the disk cache.
>> 
>> Thanks,
>> Shawn
> 
> 
> -- 
> Thanks
> Jay Potharaju

Mime
View raw message