Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 97150 invoked from network); 16 Dec 2009 18:00:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Dec 2009 18:00:42 -0000 Received: (qmail 62670 invoked by uid 500); 16 Dec 2009 18:00:41 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 62583 invoked by uid 500); 16 Dec 2009 18:00:41 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 62573 invoked by uid 99); 16 Dec 2009 18:00:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 18:00:41 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 18:00:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0181E234C4BF for ; Wed, 16 Dec 2009 10:00:19 -0800 (PST) Message-ID: <149867405.1260986419004.JavaMail.jira@brutus> Date: Wed, 16 Dec 2009 18:00:19 +0000 (UTC) From: "Patrick Hunt (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper) In-Reply-To: <1549417276.1247593274929.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791487#action_12791487 ] Patrick Hunt commented on SOLR-1277: ------------------------------------ You guys are asking the right questions. In particular the issue about "how expensive is it to lose a solr node" is a good one to think about. Unfort I don't know enough about solr to advise you, but if it's not very expensive to lose/regain a node then just let it timeout. The rest of the system will see this quickly (via ephemeral node/watch) and when the solr node is active again (comes out of the gc pause) it will talk to the zk server, see that it's session has been expired, and re-bootstrap into the solr "cloud". Another thing to ask yourself is this "if a Solr node pauses for 4 minutes due to GC pause, how different is that from a network partition or crash/reboot of that node?" What I'm saying here is, the node is _gone_ for 4 minutes -- what effect does that have on the rest of your system. Say you are expecting some very low SLA from that node, then upping the timeout is not useful here. Loss of the solr node due to gc is no diff than network partition or crash/reboot of the host. > Implement a Solr specific naming service (using Zookeeper) > ---------------------------------------------------------- > > Key: SOLR-1277 > URL: https://issues.apache.org/jira/browse/SOLR-1277 > Project: Solr > Issue Type: New Feature > Affects Versions: 1.4 > Reporter: Jason Rutherglen > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 1.5 > > Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar > > Original Estimate: 672h > Remaining Estimate: 672h > > The goal is to give Solr server clusters self-healing attributes > where if a server fails, indexing and searching don't stop and > all of the partitions remain searchable. For configuration, the > ability to centrally deploy a new configuration without servers > going offline. > We can start with basic failover and start from there? > Features: > * Automatic failover (i.e. when a server fails, clients stop > trying to index to or search it) > * Centralized configuration management (i.e. new solrconfig.xml > or schema.xml propagates to a live Solr cluster) > * Optionally allow shards of a partition to be moved to another > server (i.e. if a server gets hot, move the hot segments out to > cooler servers). Ideally we'd have a way to detect hot segments > and move them seamlessly. With NRT this becomes somewhat more > difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.