lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs
Date Thu, 22 Aug 2013 01:06:21 GMT
On 8/21/2013 6:23 PM, dmarini wrote:
> Shawn,Thanks for your reply. All of these suggestions look like good ideas
> and I will follow up. We are running Solr via the Jetty process on windows
> as well as all of our zookeepers on the same boxes as the clouds. The reason
> for this is that we're on EC2 servers so it gets ultra expensive to have a 6
> box setup just to have zookeepers on separate boxes from the solr instances.

You can have zookeeper on the same host as Solr, that's no problem.  You 
should drop to just three total zookeepers, one per node, and use the 
chroot method to keep things separate.  You can probably run zookeeper 
with a max heap of 256MB, but it likely would never need more than 
512MB.  It doesn't use much memory at all.

> Each of our Windows boxes has 8GB of RAM, with roughly 35 - 40% of it still
> seemingly free. Is there a tool or some way we can identify for certain if
> we're running into memory issues?I like your zookeeper idea and I didn't
> know that this was feasible. I will get a test bed set up that way soon.As
> for indexes, each cloud has multiple collections but we're looking at the
> largest entire cloud (multiple indexes) being about 200MB, each collection
> is between 50 and 100MB and I don't see them getting much bigger than that
> per index (but I do see more indexes being added to the clouds).

With indexes that small, I would run each Jetty/Solr with a max heap of 
1GB.  With three of them per server, that will mean that Solr is using 
3GB of RAM, leaving 5GB for the OS disk cache.  You could probably bump 
that to 1.5 or 2GB and still be OK.

> Is there a definitive advantage to running Solr on a linux box
> over windows? I need to be able to justify the time and effort it will take
> to get up to speed on a non-familiar OS if we're going to go that route but
> if there's a good enough reason I don't see why not.

Linux manages memory better than Windows, and ext4 is a much better 
filesystem than NTFS.  If you are familiar with Windows, there's nothing 
wrong with continuing to use it, except for the fact that you have to 
give Microsoft a few hundred bucks per machine for a server OS when you 
take it into production.  You can run Linux for free.

>--Would it be helpful to
> have the zookeeper ensemble on a different disk drive than the clouds? --Can
> the chattiness of all of the replication and zookeeper communication for
> multiple clouds/collections cause any of these issues (We do have some
> collections that are in constant flux with 1 - 5 requests each second, which
> we gather up and send to solr in batches of 250 documents or a 10 second
> flush)?

It never hurts to have things separated so they are on different disks, 
but SolrCloud will put hardly any load on zookeeper, so I don't think it 
matters much.  It is Solr itself that will take that load.

Thanks,
Shawn


Mime
View raw message