lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colvin Cowie (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates
Date Mon, 25 Nov 2019 05:42:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981313#comment-16981313
] 

Colvin Cowie commented on SOLR-13963:
-------------------------------------

Though it was present in previous releases, is it actually hit in them? I think that the change
from get Value to getRawValue in 8.3 is the thing that has triggered it to actually cause
a problem.

 

I'm my personal opinion, the fact that it causes corruption of data (and which sometimes indexes
without an error) means that 8.3 is broken. Certainly we can't go into production with vanilla
8.3.

 

But if 8.4 is ready in a month or so, and there's a patch available, perhaps a formal 8.3.1
isn't necessary.

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates
> ------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13963
>                 URL: https://issues.apache.org/jira/browse/SOLR-13963
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.3
>            Reporter: Colvin Cowie
>            Assignee: Noble Paul
>            Priority: Major
>         Attachments: JavaBinCodec.java, SOLR-13963.patch, SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in Solr 8.3 during
distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) set of failing
tests where the data being sent in intranode requests was _sometimes_ corrupted. For example
if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from SOLR-13682 which
meant that org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call org.apache.solr.common.SolrInputField.getValue()
rather than org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr()
which in this context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different locations, but
only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ fixes
the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode of autoCreateFields=true
in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Mime
View raw message