lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrCloud with Zookeeper ensemble in production environment: SEVERE problems.
Date Wed, 13 Mar 2013 20:45:41 GMT
Stack traces..

First,
jps -l

that will give you a the process IDs of your running Java processes. Then:

jstack <pid from above>

Usually I pipe the output from jstack into a text file...

Best
Erick


On Wed, Mar 13, 2013 at 1:48 PM, Luis Cappa Banda <luiscappa@gmail.com>wrote:

> Uhm, how can I do that... 'cleanly'? I know that with JConsole it´s posible
> to output this traces, but with a .war application built on top of Spring I
> don´t know how can I do that. In any case, here is my CloudSolrServer
> wrapper that is used by other classes. There is no sync method or piece of
> code:
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> *public class BinaryLBHttpSolrServer extends LBHttpSolrServer {*
>
> private static final long serialVersionUID = 3905956120804659445L;
>     public BinaryLBHttpSolrServer(String[] endpoints) throws
> MalformedURLException {
>     super(endpoints);
>     }
>
>     @Override
>     protected HttpSolrServer makeServer(String server) throws
> MalformedURLException {
>         HttpSolrServer solrServer = super.makeServer(server);
>         solrServer.setRequestWriter(new BinaryRequestWriter());
>         return solrServer;
>     }
> }
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> *public class CloudSolrHttpServerImpl implements CloudSolrHttpServer {*
>  private CloudSolrServer cloudSolrServer;
>
> private Logger log = Logger.getLogger(CloudSolrHttpServerImpl.class);
>
> public CloudSolrHttpServerImpl(String zookeeperEndpoints, String[]
> endpoints, int clientTimeout,
> int connectTimeout, String cloudCollection) {
>  try {
> BinaryLBHttpSolrServer lbSolrServer = new *BinaryLBHttpSolrServer*
> (endpoints);
> this.cloudSolrServer = new CloudSolrServer(zookeeperEndpoints,
> lbSolrServer);
> this.cloudSolrServer.setZkConnectTimeout(connectTimeout);
> this.cloudSolrServer.setZkClientTimeout(clientTimeout);
> this.cloudSolrServer.setDefaultCollection(cloudCollection);
>  } catch (MalformedURLException e) {
> log.error(e);
> }
> }
>
> @Override
> public QueryResponse *search*(SolrQuery query) throws SolrServerException {
> return cloudSolrServer.query(query, METHOD.POST);
> }
>
> @Override
> public boolean *index*(DocumentBean user) {
> boolean indexed = false;
> int retries = 0;
>  do {
> indexed = addBean(user);
> retries++;
>  } while(!indexed && retries<4);
>  return indexed;
> }
>  @Override
> public boolean *update*(SolrInputDocument updateDoc) {
> boolean update = false;
> int retries = 0;
>
> do {
> update = addSolrInputDocument(updateDoc);
> retries++;
>  } while(!update && retries<4);
>  return update;
> }
>  @Override
> public void commit() {
> try {
> cloudSolrServer.commit();
> } catch (SolrServerException e) {
>      log.error(e);
> } catch (IOException e) {
>      log.error(e);
> }
> }
>
> @Override
> public boolean *delete*(String ... ids) {
> boolean deleted = false;
>  List<String> idList = Arrays.asList(ids);
>  try {
> this.cloudSolrServer.deleteById(idList);
> this.cloudSolrServer.commit(true, true);
> deleted = true;
>
> } catch (SolrServerException e) {
> log.error(e);
>
> } catch (IOException e) {
> log.error(e);
>  }
>  return deleted;
> }
>
> @Override
> public void *optimize*() {
> try {
> this.cloudSolrServer.optimize();
>  } catch (SolrServerException e) {
> log.error(e);
>  } catch (IOException e) {
> log.error(e);
> }
> }
>  /*
>  * ********************
>  *  Getters & setters *
>  * ********************
>  * */
>  public CloudSolrServer getSolrServer() {
> return cloudSolrServer;
> }
>
> public void setSolrServer(CloudSolrServer solrServer) {
> this.cloudSolrServer = solrServer;
> }
>
> private boolean addBean(DocumentBean user) {
> boolean added = false;
>  try {
> this.cloudSolrServer.addBean(user, 100);
> this.commit();
>
> } catch (IOException e) {
> log.error(e);
>
> } catch (SolrServerException e) {
> log.error(e);
>  }catch(SolrException e) {
> log.error(e);
> }
>  return added;
> }
>  private boolean addSolrInputDocument(SolrInputDocument updateDoc) {
> boolean added = false;
>  try {
> this.cloudSolrServer.add(updateDoc, 100);
> this.commit();
> added = true;
>  } catch (IOException e) {
> log.error(e);
>
> } catch (SolrServerException e) {
> log.error(e);
>  }catch(SolrException e) {
> log.error(e);
> }
>  return added;
> }
> }
>
> Thank you very much, Mark.
>
>
> -  Luis Cappa
>
>
>
> And
> 2013/3/13 Mark Miller <markrmiller@gmail.com>
>
> >
> > Could you capture some thread stack traces in the 'engine' and see if
> > there are any blocking methods?
> >
> > - Mark
> >
> > On Mar 13, 2013, at 1:34 PM, Luis Cappa Banda <luiscappa@gmail.com>
> wrote:
> >
> > > Just one correction:
> > >
> > > When I said:
> > >
> > >   - I´ve checked SolrCloud via Solr Admin interface and it´s OK:
> > >   everything is green, and I cant execute queries directly into Solr.
> > >
> > > I mean:
> > >
> > >
> > >   - I´ve checked SolrCloud via Solr Admin interface and it´s OK:
> > >   everything is green, and *I can* execute queries directly into Solr.
> > >
> > >
> > > Thanks!
> > >
> > >
> > > - Luis Cappa
> > >
> > >
> > > 2013/3/13 Luis Cappa Banda <luiscappa@gmail.com>
> > >
> > >> Hello, guys!
> > >>
> > >> I´ve been experiencing some annoying behavior with my current
> production
> > >> scenario. Here is the snapshot:
> > >>
> > >>
> > >>   - SolrCloud: 2 shards
> > >>   - Zookeeper ensemble: 3 nodes in *different machines *(most of the
> > >>   tutorials installs 3 Zookeeper nodes in the same machine).
> > >>   - This is the zoo.cfg from every
> > >>
> > >> tickTime=2000  // I´ve also tried with 60000
> > >>
> > >> initLimit=10
> > >>
> > >> syncLimit=5
> > >>
> > >> dataDir=/var/lib/zookeeper
> > >>
> > >> clientPort=9000
> > >>
> > >> server.1=zoohost1:2888:3888
> > >>
> > >> server.2=zoohost1:2888:3888
> > >>
> > >> server.3=zoohost1:2888:3888
> > >>
> > >>
> > >>
> > >>   - I´ve developed a Java Application with a REST API (let´s call it
*
> > >>   engine*) that dispatches queries into SolrCloud. It´s a wrapper
> around
> > >>   CloudSolrServer, so it´s mandatory to specify some Zookeeper
> > configuration
> > >>   params too. They are loaded dynamically when the application is
> > deployed in
> > >>   a Tomcat server, but the current values that I´m using are as
> follows:
> > >>
> > >> cloudSolrServer.*setZkConnectTimeout(60000)*
> > >>
> > >> cloudSolrServer.*setZkClientTimeout(60000)*
> > >> *
> > >> *
> > >> *
> > >> *
> > >>
> > >> *THE PROBLEM*
> > >> *
> > >> *
> > >> Everything goes OK, but after two days more or less (yes, I´ve checked
> > >> that this behavior occurrs periodically, more or less) the *engine
> > blocks
> > >> * and cannot dispatch any query to SolrCloud.
> > >>
> > >>   - The *engine *log only outputs "updating Zookeeper..." one last
> time,
> > >>   but never updates.
> > >>   - I´ve checked SolrCloud via Solr Admin interface and it´s OK:
> > >>   everything is green, and I cant execute queries directly into Solr.
> > >>   - So then Solr appears to be OK, so the next step is to restart
> > *engine
> > >>   but *it again appears "updating Zookeeper...". Unfortunately switch
> > >>   off + switch on doesn´t work here, :-(
> > >>   - I´ve checked too Zookeeper logs and it appears some connection log
> > >>   outs, but the ensemble appears to be OK too.
> > >>   - *The end: *If I restart Zookeeper one by one, and I restart
> > >>   SolrCloud, plus I restart the engine, the problem is solved. I´m
> using
> > >>   Amazon AWS as hostage, so I discard connection problems between
> > instances.
> > >>
> > >>
> > >> Does anyone experienced something similar? Can anybody shed some light
> > on
> > >> this problem?
> > >>
> > >> Thank you very much.
> > >>
> > >>
> > >> Regards,
> > >>
> > >>
> > >> - Luis Cappa
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message