lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Kudryavtsev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-11459) AddUpdateCommand#prevVersion is not cleared which may lead to problem for in-place updates of non existed documents
Date Tue, 10 Oct 2017 14:37:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-11459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrey Kudryavtsev updated SOLR-11459:
--------------------------------------
    Description: 
I have a 1_shard / *m*_replicas SolrCloud cluster and run batches of 5 - 10k in-place updates
from time to time. 
Once I noticed that job "hangs" - it started and couldn't finish for a a while.
Logs were full of messages like:

{code} Missing update, on which current in-place update depends on, hasn't arrived. id=__,
looking for version=___, last found version=0"  {code}

{code} 
Tried to fetch document ___ from the leader, but the leader says document has been deleted.
Deleting the document here and skipping this update: Last found version: 0, was looking for:
___",24,0,"but the leader says document has been deleted. Deleting the document here and skipping
this update: Last found version: 0
{code}

Further analysis shows this:
* There are 100-500 updates for non-existed documents among other updates (something that
I have to deal with)
* Leader receives bunch of updates and executes this update one by one. {{JavabinLoader}}
which is used by processing documents reuses same instance of {{AddUpdateCommand}} for every
update and just [clearing its state at the end|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99].
[AddUpdateCommand#prevVersion| https://github.com/apache/lucene-solr/blob/6396cb759f8c799f381b0730636fa412761030ce/solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java#L76]
is not cleared.   
* In case of update is in-place update, but specified document is not existed, this update
is processed as a regular atomic update (i.e. new doc is created), but {{prevVersion}} is
used as a {{distrib.inplace.prevversion}} parameter in sequential calls to slave in DistributedUpdateProcessor.
{{prevVersion}} wasn't cleared, so it may contain version from previous processed updates.
* Slaves checks it's own version on documents which is 0 (cause doc is not exists), slave
thinks that some updates were missed and spends 5 seconds in [DistributedUpdateProcessor#waitForDependentUpdates|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99]
waiting for missed updates (no luck) and also tried to get "correct" version from leader (no
luck as well) 
* So update costs me *m* * 5 sec 

I workarounded this by explicit check of doc existence, but it probably should be fixed.

Obviously first guess is that  prevVersion should be cleared in {{AddUpdateCommand#clear}},
but have no clue how to test it.

{code}
+++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java	(revision )
@@ -78,6 +78,7 @@
      updateTerm = null;
      isLastDocInBatch = false;
      version = 0;
+     prevVersion = -1;
    }
{code}


  was:
I have a 1_shard / *m*_replicas SolrCloud cluster and run batches of 5 - 10k in-place updates
from time to time. 
Once I noticed that job "hangs" - it started and couldn't finish for a a while.
Logs were full of messages like:

{code} Missing update, on which current in-place update depends on, hasn't arrived. id=__,
looking for version=___, last found version=0"  {code}

{code} 
Tried to fetch document ___ from the leader, but the leader says document has been deleted.
Deleting the document here and skipping this update: Last found version: 0, was looking for:
___",24,0,"but the leader says document has been deleted. Deleting the document here and skipping
this update: Last found version: 0
{code}

Further analysis shows this:
* There are 100-500 updates for non-existed documents among regular updates (something that
I have to deal with)
* Leader receives bunch of updates and executes this update one by one. {{JavabinLoader}}
which is used by processing documents reuses same instance of {{AddUpdateCommand}} for every
update and just [clearing its state at the end|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99].
[AddUpdateCommand#prevVersion| https://github.com/apache/lucene-solr/blob/6396cb759f8c799f381b0730636fa412761030ce/solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java#L76]
is not cleared.   
* In case of update is in-place update, but specified document is not existed, this update
is processed as a regular atomic update (i.e. new doc is created), but {{prevVersion}} is
used as a {{distrib.inplace.prevversion}} parameter in sequential calls to slave in DistributedUpdateProcessor.
{{prevVersion}} wasn't cleared, so it may contain version from previous processed updates.
* Slaves checks it's own version on documents which is 0 (cause doc is not exists), slave
thinks that some updates were missed and spends 5 seconds in [DistributedUpdateProcessor#waitForDependentUpdates|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99]
waiting for missed updates (no luck) and also tried to get "correct" version from leader (no
luck as well) 
* So update costs me *m* * 5 sec 

I workarounded this by explicit check of doc existence, but it probably should be fixed.

Obviously first guess is that  prevVersion should be cleared in {{AddUpdateCommand#clear}},
but have no clue how to test it.

{code}
+++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java	(revision )
@@ -78,6 +78,7 @@
      updateTerm = null;
      isLastDocInBatch = false;
      version = 0;
+     prevVersion = -1;
    }
{code}



> AddUpdateCommand#prevVersion is not cleared which may lead to problem for in-place updates
of non existed documents
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11459
>                 URL: https://issues.apache.org/jira/browse/SOLR-11459
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrey Kudryavtsev
>
> I have a 1_shard / *m*_replicas SolrCloud cluster and run batches of 5 - 10k in-place
updates from time to time. 
> Once I noticed that job "hangs" - it started and couldn't finish for a a while.
> Logs were full of messages like:
> {code} Missing update, on which current in-place update depends on, hasn't arrived. id=__,
looking for version=___, last found version=0"  {code}
> {code} 
> Tried to fetch document ___ from the leader, but the leader says document has been deleted.
Deleting the document here and skipping this update: Last found version: 0, was looking for:
___",24,0,"but the leader says document has been deleted. Deleting the document here and skipping
this update: Last found version: 0
> {code}
> Further analysis shows this:
> * There are 100-500 updates for non-existed documents among other updates (something
that I have to deal with)
> * Leader receives bunch of updates and executes this update one by one. {{JavabinLoader}}
which is used by processing documents reuses same instance of {{AddUpdateCommand}} for every
update and just [clearing its state at the end|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99].
[AddUpdateCommand#prevVersion| https://github.com/apache/lucene-solr/blob/6396cb759f8c799f381b0730636fa412761030ce/solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java#L76]
is not cleared.   
> * In case of update is in-place update, but specified document is not existed, this update
is processed as a regular atomic update (i.e. new doc is created), but {{prevVersion}} is
used as a {{distrib.inplace.prevversion}} parameter in sequential calls to slave in DistributedUpdateProcessor.
{{prevVersion}} wasn't cleared, so it may contain version from previous processed updates.
> * Slaves checks it's own version on documents which is 0 (cause doc is not exists), slave
thinks that some updates were missed and spends 5 seconds in [DistributedUpdateProcessor#waitForDependentUpdates|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99]
waiting for missed updates (no luck) and also tried to get "correct" version from leader (no
luck as well) 
> * So update costs me *m* * 5 sec 
> I workarounded this by explicit check of doc existence, but it probably should be fixed.
> Obviously first guess is that  prevVersion should be cleared in {{AddUpdateCommand#clear}},
but have no clue how to test it.
> {code}
> +++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java	(revision )
> @@ -78,6 +78,7 @@
>       updateTerm = null;
>       isLastDocInBatch = false;
>       version = 0;
> +     prevVersion = -1;
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message