lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Potter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5474) Have a new mode for SolrJ to not watch any ZKNode
Date Mon, 02 Dec 2013 16:33:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836662#comment-13836662
] 

Timothy Potter commented on SOLR-5474:
--------------------------------------

Thanks for the info about changes to ZkStateReader for SOLR-5473. I'm trying to think about
how to differentiate between downed nodes and slow queries using this approach.

Let's consider the scenario where there are two nodes serving a shard (A & B) and LazyCloudSolrServer
sends a query request to node A. Imagine that node A is down, but the client application doesn't
know that yet because its cached state is stale. The request will timeout after some configurable
duration. After the timeout, LazyCloudSolrServer refreshes the cached state and realizes node
A is down so it sends the request to node B and the query succeeds.

However, if node A is actually healthy and the cause of the timeout is a slow query, then
the client should have waited longer. After refreshing the state from ZooKeeper (in response
to the timeout), the client can realize that since A was healthy, the cause of the timeout
was likely a slow query. So does client re-send the slow query? That seems like it could end
up in a loop of timeout / resends. Does LazyCloudSolrServer keep track of how many attempts
it's made for a given query ... just brainstorming here ... I know Solr supports the timeAllowed
parameter for a query but that's optional.

I suppose this scenario is still possible even with the current approach of having watcher
on the state znode on the client side. Although, I have to think that under the current approach,
the probability of sending a request to a downed node goes down since state is refreshed in
real-time. The zk version doesn't help here because if node A is down, the only thing the
client can do is wait for the request to timeout.

> Have a new mode for SolrJ to not watch any ZKNode
> -------------------------------------------------
>
>                 Key: SOLR-5474
>                 URL: https://issues.apache.org/jira/browse/SOLR-5474
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Noble Paul
>
> In this mode SolrJ would not watch any ZK node
> It fetches the state  on demand and cache the most recently used n collections in memory.
> SolrJ would not listen to any ZK node. When a request comes for a collection ‘xcoll’
> it would first check if such a collection exists
> If yes it first looks up the details in the local cache for that collection
> If not found in cache , it fetches the node /collections/xcoll/state.json and caches
the information
> Any query/update will be sent with extra query param specifying the collection name ,
shard name, Role (Leader/Replica), and range (example \_target_=xcoll:shard1:L:80000000-b332ffff)
. A node would throw an error (INVALID_NODE) if it does not the serve the collection/shard/Role/range
combo.
> If SolrJ gets INVALID_NODE error it would invalidate the cache and fetch fresh state
information for that collection (and caches it again)
> If there is a connection timeout, SolrJ assumes the node is down and re-fetch the state
for the collection and try again



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message