Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5952CDE69 for ; Wed, 13 Mar 2013 20:46:11 +0000 (UTC) Received: (qmail 85343 invoked by uid 500); 13 Mar 2013 20:46:07 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 85291 invoked by uid 500); 13 Mar 2013 20:46:07 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 85283 invoked by uid 99); 13 Mar 2013 20:46:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 20:46:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.42 as permitted sender) Received: from [209.85.216.42] (HELO mail-qa0-f42.google.com) (209.85.216.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 20:46:02 +0000 Received: by mail-qa0-f42.google.com with SMTP id cr7so2462685qab.8 for ; Wed, 13 Mar 2013 13:45:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Z/Fe0RJ35V4hd9oUqyV3yuiSyyFGYBT7uFpRXmlz4oQ=; b=LQ6N83MLbdMhVUZeTkYJf0X7aok8FX5swiRcxXMH7Kst3L7JDO1IVFbgunldRpepYI JQeYQKiHFklF8Xk6lEZf0r73Iqb6/mx6OUWBcDdXXPkYtbd0eMJwEWd/WtrpjLXwznOE gB8bZGezfaAAk9rKU2vWUxC3LVNKrYBuTPsjthctiIuk9L73wXr6/lNdyExmCE0sSlC6 +uiuiXmg+mYXYKiVqSuvo+opEdSXGOOOtJR5GUb9fgvRIsAnKMYpKOX7Qw/lWYOTmKuX 0mX0HI0kVDDe7Mv/l1H2NyM4dhuApHvz3hdgpGztw85UOehz3TJDblr4Et/62IpkA9yZ PZPw== MIME-Version: 1.0 X-Received: by 10.229.178.232 with SMTP id bn40mr6826920qcb.98.1363207541308; Wed, 13 Mar 2013 13:45:41 -0700 (PDT) Received: by 10.49.53.41 with HTTP; Wed, 13 Mar 2013 13:45:41 -0700 (PDT) In-Reply-To: References: <8E184667-39D1-45D6-BD05-5926ED386CE3@gmail.com> Date: Wed, 13 Mar 2013 16:45:41 -0400 Message-ID: Subject: Re: SolrCloud with Zookeeper ensemble in production environment: SEVERE problems. From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=005045015a42414ed804d7d47d2d X-Virus-Checked: Checked by ClamAV on apache.org --005045015a42414ed804d7d47d2d Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Stack traces.. First, jps -l that will give you a the process IDs of your running Java processes. Then: jstack Usually I pipe the output from jstack into a text file... Best Erick On Wed, Mar 13, 2013 at 1:48 PM, Luis Cappa Banda wrot= e: > Uhm, how can I do that... 'cleanly'? I know that with JConsole it=B4s pos= ible > to output this traces, but with a .war application built on top of Spring= I > don=B4t know how can I do that. In any case, here is my CloudSolrServer > wrapper that is used by other classes. There is no sync method or piece o= f > code: > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > *public class BinaryLBHttpSolrServer extends LBHttpSolrServer {* > > private static final long serialVersionUID =3D 3905956120804659445L; > public BinaryLBHttpSolrServer(String[] endpoints) throws > MalformedURLException { > super(endpoints); > } > > @Override > protected HttpSolrServer makeServer(String server) throws > MalformedURLException { > HttpSolrServer solrServer =3D super.makeServer(server); > solrServer.setRequestWriter(new BinaryRequestWriter()); > return solrServer; > } > } > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > *public class CloudSolrHttpServerImpl implements CloudSolrHttpServer {* > private CloudSolrServer cloudSolrServer; > > private Logger log =3D Logger.getLogger(CloudSolrHttpServerImpl.class); > > public CloudSolrHttpServerImpl(String zookeeperEndpoints, String[] > endpoints, int clientTimeout, > int connectTimeout, String cloudCollection) { > try { > BinaryLBHttpSolrServer lbSolrServer =3D new *BinaryLBHttpSolrServer* > (endpoints); > this.cloudSolrServer =3D new CloudSolrServer(zookeeperEndpoints, > lbSolrServer); > this.cloudSolrServer.setZkConnectTimeout(connectTimeout); > this.cloudSolrServer.setZkClientTimeout(clientTimeout); > this.cloudSolrServer.setDefaultCollection(cloudCollection); > } catch (MalformedURLException e) { > log.error(e); > } > } > > @Override > public QueryResponse *search*(SolrQuery query) throws SolrServerException= { > return cloudSolrServer.query(query, METHOD.POST); > } > > @Override > public boolean *index*(DocumentBean user) { > boolean indexed =3D false; > int retries =3D 0; > do { > indexed =3D addBean(user); > retries++; > } while(!indexed && retries<4); > return indexed; > } > @Override > public boolean *update*(SolrInputDocument updateDoc) { > boolean update =3D false; > int retries =3D 0; > > do { > update =3D addSolrInputDocument(updateDoc); > retries++; > } while(!update && retries<4); > return update; > } > @Override > public void commit() { > try { > cloudSolrServer.commit(); > } catch (SolrServerException e) { > log.error(e); > } catch (IOException e) { > log.error(e); > } > } > > @Override > public boolean *delete*(String ... ids) { > boolean deleted =3D false; > List idList =3D Arrays.asList(ids); > try { > this.cloudSolrServer.deleteById(idList); > this.cloudSolrServer.commit(true, true); > deleted =3D true; > > } catch (SolrServerException e) { > log.error(e); > > } catch (IOException e) { > log.error(e); > } > return deleted; > } > > @Override > public void *optimize*() { > try { > this.cloudSolrServer.optimize(); > } catch (SolrServerException e) { > log.error(e); > } catch (IOException e) { > log.error(e); > } > } > /* > * ******************** > * Getters & setters * > * ******************** > * */ > public CloudSolrServer getSolrServer() { > return cloudSolrServer; > } > > public void setSolrServer(CloudSolrServer solrServer) { > this.cloudSolrServer =3D solrServer; > } > > private boolean addBean(DocumentBean user) { > boolean added =3D false; > try { > this.cloudSolrServer.addBean(user, 100); > this.commit(); > > } catch (IOException e) { > log.error(e); > > } catch (SolrServerException e) { > log.error(e); > }catch(SolrException e) { > log.error(e); > } > return added; > } > private boolean addSolrInputDocument(SolrInputDocument updateDoc) { > boolean added =3D false; > try { > this.cloudSolrServer.add(updateDoc, 100); > this.commit(); > added =3D true; > } catch (IOException e) { > log.error(e); > > } catch (SolrServerException e) { > log.error(e); > }catch(SolrException e) { > log.error(e); > } > return added; > } > } > > Thank you very much, Mark. > > > - Luis Cappa > > > > And > 2013/3/13 Mark Miller > > > > > Could you capture some thread stack traces in the 'engine' and see if > > there are any blocking methods? > > > > - Mark > > > > On Mar 13, 2013, at 1:34 PM, Luis Cappa Banda > wrote: > > > > > Just one correction: > > > > > > When I said: > > > > > > - I=B4ve checked SolrCloud via Solr Admin interface and it=B4s OK: > > > everything is green, and I cant execute queries directly into Solr. > > > > > > I mean: > > > > > > > > > - I=B4ve checked SolrCloud via Solr Admin interface and it=B4s OK: > > > everything is green, and *I can* execute queries directly into Solr= . > > > > > > > > > Thanks! > > > > > > > > > - Luis Cappa > > > > > > > > > 2013/3/13 Luis Cappa Banda > > > > > >> Hello, guys! > > >> > > >> I=B4ve been experiencing some annoying behavior with my current > production > > >> scenario. Here is the snapshot: > > >> > > >> > > >> - SolrCloud: 2 shards > > >> - Zookeeper ensemble: 3 nodes in *different machines *(most of the > > >> tutorials installs 3 Zookeeper nodes in the same machine). > > >> - This is the zoo.cfg from every > > >> > > >> tickTime=3D2000 // I=B4ve also tried with 60000 > > >> > > >> initLimit=3D10 > > >> > > >> syncLimit=3D5 > > >> > > >> dataDir=3D/var/lib/zookeeper > > >> > > >> clientPort=3D9000 > > >> > > >> server.1=3Dzoohost1:2888:3888 > > >> > > >> server.2=3Dzoohost1:2888:3888 > > >> > > >> server.3=3Dzoohost1:2888:3888 > > >> > > >> > > >> > > >> - I=B4ve developed a Java Application with a REST API (let=B4s cal= l it * > > >> engine*) that dispatches queries into SolrCloud. It=B4s a wrapper > around > > >> CloudSolrServer, so it=B4s mandatory to specify some Zookeeper > > configuration > > >> params too. They are loaded dynamically when the application is > > deployed in > > >> a Tomcat server, but the current values that I=B4m using are as > follows: > > >> > > >> cloudSolrServer.*setZkConnectTimeout(60000)* > > >> > > >> cloudSolrServer.*setZkClientTimeout(60000)* > > >> * > > >> * > > >> * > > >> * > > >> > > >> *THE PROBLEM* > > >> * > > >> * > > >> Everything goes OK, but after two days more or less (yes, I=B4ve che= cked > > >> that this behavior occurrs periodically, more or less) the *engine > > blocks > > >> * and cannot dispatch any query to SolrCloud. > > >> > > >> - The *engine *log only outputs "updating Zookeeper..." one last > time, > > >> but never updates. > > >> - I=B4ve checked SolrCloud via Solr Admin interface and it=B4s OK: > > >> everything is green, and I cant execute queries directly into Solr= . > > >> - So then Solr appears to be OK, so the next step is to restart > > *engine > > >> but *it again appears "updating Zookeeper...". Unfortunately switc= h > > >> off + switch on doesn=B4t work here, :-( > > >> - I=B4ve checked too Zookeeper logs and it appears some connection= log > > >> outs, but the ensemble appears to be OK too. > > >> - *The end: *If I restart Zookeeper one by one, and I restart > > >> SolrCloud, plus I restart the engine, the problem is solved. I=B4m > using > > >> Amazon AWS as hostage, so I discard connection problems between > > instances. > > >> > > >> > > >> Does anyone experienced something similar? Can anybody shed some lig= ht > > on > > >> this problem? > > >> > > >> Thank you very much. > > >> > > >> > > >> Regards, > > >> > > >> > > >> - Luis Cappa > > >> > > > > > --005045015a42414ed804d7d47d2d--