lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Evans <tevans...@googlemail.com>
Subject Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question
Date Thu, 15 Dec 2016 14:52:57 GMT
On Thu, Dec 15, 2016 at 12:37 PM, GW <thegeoforce@gmail.com> wrote:
> While my client is all PHP it does not use a solr client. I wanted to stay
> with he latest Solt Cloud and the PHP clients all seemed to have some kind
> of issue being unaware of newer Solr Cloud versions. The client makes pure
> REST calls with Curl. It is stateful through local storage. There is no
> persistent connection. There are no cookies and PHP work is not sticky so
> it is designed for round robin on both the internal network.
>
> I'm thinking we have a different idea of persistent. To me something like
> MySQL can be persistent, ie a fifo queue for requests. The stack can be
> always on/connected on something like a heap storage.
>
> I never thought about the impact of a solr node crashing with PHP on top.
> Many thanks!
>
> Was thinking of running a conga line (Ricci & Luci projects) and shutting
> down and replacing failed nodes. Never done this with Solr. I don't see any
> reasons why it would not work.
>
> ** When you say an array of connections per host. It would still require an
> internal DNS because hosts files don't round robin. perhaps this is handled
> in the Python client??


The best Solr clients will take the URIs of the Zookeeper servers;
they do not make queries via Zookeeper, but will read the current
cluster status from zookeeper in order to determine which solr node to
actually connect to, taking in to account what nodes are alive, and
the state of particular shards.

SolrJ (Java) will do this, as will pysolr (python), I'm not aware of a
PHP client that is ZK aware.

If you don't have a ZK aware client, there are several options:

1) Make your favourite client ZK aware, like in [1]
2) Use round robin DNS to distribute requests amongst the cluster.
3) Use a hardware or software load balancer in front of the cluster.
4) Use shared state to store the names of active nodes*

All apart from 1) have significant downsides:

2) Has no concept of a node being down. Down nodes should not cause
query failures, the requests should go elsewhere in the cluster.
Requires updating DNS to add or remove nodes.
3) Can detect "down" nodes. Has no idea about the state of the
cluster/shards (usually).
4) Basically duplicates what ZooKeeper does, but less effectively -
doesn't know cluster state, down nodes, nodes that are up but with
unhealthy replicas...

>
> You have given me some good clarification. I think lol. I know I can spin
> out WWW servers based on load. I'm not sure how shit will fly spinning up
> additional solr nodes. I'm not sure what happens if you spin up an empty
> solr node and what will happen with replication, shards and load cost of
> spinning an instance. I'm facing some experimentation me thinks. This will
> be a manual process at first, for sure....
>
> I guess I could put the solr connect requests in my clients into a try
> loop, looking for successful connections by name before any action.

In SolrCloud mode, you can spin up/shut down nodes as you like.
Depending on how you have configured your collections, new replicas
may be automatically created on the new node, or the node will simply
become part of the cluster but empty, ready for you to assign new
replicas to it using the Collections API.

You can also use what are called "snitches" to define rules for how
you want replicas/shards allocated amongst the nodes, eg to avoid
placing all the replicas for a shard in the same rack.

Cheers

Tom

[1] https://github.com/django-haystack/pysolr/commit/366f14d75d2de33884334ff7d00f6b19e04e8bbf

Mime
View raw message