lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-13470) SolrException msg not always propogated to HttpClient (may be specific to SOLR-12999 ?)
Date Tue, 14 May 2019 21:26:01 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839807#comment-16839807
] 

Hoss Man edited comment on SOLR-13470 at 5/14/19 9:25 PM:
----------------------------------------------------------

Here's an example of how things can go wrong with the attached patch that hardens up the asserts
regarding the expected exception...

{noformat}
...
   [junit4]   2> 12780 WARN  (explicit-fetchindex-cmd) [    ] o.a.s.h.IndexFetcher File
segments_2 did not match. expected checksum is 1298216336 and actual is checksum 1205232915.
expected length is 252 and actual length is 252
   [junit4]   2> 12780 INFO  (explicit-fetchindex-cmd) [    ] o.a.s.h.IndexFetcher This
disk does not have enough space to download the index from leader/master. So cleaning up the
local index.  This may lead to loss of data/or node if index replication fails in between
   [junit4]   2> 12785 INFO  (Thread-19) [    ] o.a.s.c.s.i.HttpSolrClient nocommit: remote
ex (403) w/o error? ... {responseHeader={status=0,QTime=39},response={numFound=1200,start=0,docs=[]}}
   [junit4]   2> 12794 ERROR (Thread-19) [    ] o.a.s.h.TestReplicationHandlerDiskOverFlow
Query Thread Failure
   [junit4]   2>           => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
   [junit4]   2> 
   [junit4]   2> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
   [junit4]   2> 
   [junit4]   2> request: http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154)
~[test/:?]
   [junit4]   2> 	at java.lang.Thread.run(Thread.java:834) [?:?]
...
   [junit4] HEARTBEAT J0 PID(26623@tray): 2019-05-14T14:01:33, stalled for 69.1s at: TestReplicationHandlerDiskOverFlow.testDiskOverFlow
   [junit4]   2> 72012 INFO  (indexFetcher-27-thread-1) [    ] o.a.s.h.ReplicationHandler
Poll disabled
   [junit4]   2> 72780 ERROR (explicit-fetchindex-cmd) [    ] o.a.s.h.TestReplicationHandlerDiskOverFlow
IndexFetcher Thread Failure
   [junit4]   2>           => java.util.concurrent.TimeoutException
   [junit4]   2> 	at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258)
   [junit4]   2> java.util.concurrent.TimeoutException: null
   [junit4]   2> 	at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258)
~[?:?]
   [junit4]   2> 	at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:436) ~[?:?]
   [junit4]   2> 	at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow$2.getAsBoolean(TestReplicationHandlerDiskOverFlow.java:137)
~[test/:?]
   [junit4]   2> 	at org.apache.solr.handler.IndexFetcher.deleteFilesInAdvance(IndexFetcher.java:1133)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1033)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:575)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:422)
~[java/:?]
   [junit4]   2> 	at org.apache.solr.handler.ReplicationHandler.lambda$fetchIndex$0(ReplicationHandler.java:345)
~[java/:?]
   [junit4]   2> 	at java.lang.Thread.run(Thread.java:834) [?:?]
...
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestReplicationHandlerDiskOverFlow
-Dtests.method=testDiskOverFlow -Dtests.seed=7A25E0BCF2F3D8D4 -Dtests.slow=true -Dtests.badapples=true
-Dtests.locale=kok -Dtests.timezone=MET -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 70.4s | TestReplicationHandlerDiskOverFlow.testDiskOverFlow <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: threads encountered failures (see
logs for when) expected:<[]> but was:<[org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
   [junit4]    > request: http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2,
java.util.concurrent.TimeoutException]>
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([7A25E0BCF2F3D8D4:B017476C918DD2BB]:0)
   [junit4]    > 	at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.testDiskOverFlow(TestReplicationHandlerDiskOverFlow.java:189)
   [junit4]    > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
   [junit4]    > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]    > 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]    > 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   [junit4]    > 	at java.base/java.lang.Thread.run(Thread.java:834)
   [junit4]   2> NOTE: leaving temporary files on disk at: /home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.handler.TestReplicationHandlerDiskOverFlow_7A25E0BCF2F3D8D4-001
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): {name=BlockTreeOrds(blocksize=128),
id=BlockTreeOrds(blocksize=128)}, docValues:{}, maxPointsInLeafNode=1340, maxMBSortInHeap=5.7037203451015515,
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@57ee8eda), locale=kok,
timezone=MET
   [junit4]   2> NOTE: Linux 3.19.0-84-generic amd64/Oracle Corporation 11.0.2 (64-bit)/cpus=4,threads=2,free=205025280,total=260046848
   [junit4]   2> NOTE: All tests run in this JVM: [TestReplicationHandlerDiskOverFlow]
   [junit4] Completed [1/1 (1!)] in 72.25s, 1 test, 1 failure <<< FAILURES!
{noformat}

...note that the added logging in HttpSolrClient shows that we're evidently getting a valid
search result w/a 403 response code (did the query somehow already run successfully before
the server realized it should return 403?)

But re-running this same seed multiple times will frequently pass (ie: the exception message
makes it all the way to the client)

(Jira comment edited to fix mangled whitespace)
 


was (Author: hossman):

Here's an example of how things can go wrong with the attached patch that hardens up the asserts
regarding the expected exception...

{noformat}
...
[junit4] 2> 12780 WARN (explicit-fetchindex-cmd) [ ] o.a.s.h.IndexFetcher File segments_2
did not match. expected checksum is 1298216336 and actual is checksum 1205232915. expected
length is 252 and actual length is 252
[junit4] 2> 12780 INFO (explicit-fetchindex-cmd) [ ] o.a.s.h.IndexFetcher This disk does
not have enough space to download the index from leader/master. So cleaning up the local index.
This may lead to loss of data/or node if index replication fails in between
[junit4] 2> 12785 INFO (Thread-19) [ ] o.a.s.c.s.i.HttpSolrClient nocommit: remote ex (403)
w/o error? ... {responseHeader={status=0,QTime=39},response={numFound=1200,start=0,docs=[]}}
[junit4] 2> 12794 ERROR (Thread-19) [ ] o.a.s.h.TestReplicationHandlerDiskOverFlow Query
Thread Failure
[junit4] 2> => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
[junit4] 2>
[junit4] 2> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.0.1:46668/solr/collection1: Forbidden
[junit4] 2>
[junit4] 2> request: http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2
[junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
~[java/:?]
[junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[java/:?]
[junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[java/:?]
[junit4] 2> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) ~[java/:?]
[junit4] 2> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?]
[junit4] 2> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?]
[junit4] 2> at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154)
~[test/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?]
...
[junit4] HEARTBEAT J0 PID(26623@tray): 2019-05-14T14:01:33, stalled for 69.1s at: TestReplicationHandlerDiskOverFlow.testDiskOverFlow
[junit4] 2> 72012 INFO (indexFetcher-27-thread-1) [ ] o.a.s.h.ReplicationHandler Poll disabled
[junit4] 2> 72780 ERROR (explicit-fetchindex-cmd) [ ] o.a.s.h.TestReplicationHandlerDiskOverFlow
IndexFetcher Thread Failure
[junit4] 2> => java.util.concurrent.TimeoutException
[junit4] 2> at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258)
[junit4] 2> java.util.concurrent.TimeoutException: null
[junit4] 2> at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258) ~[?:?]
[junit4] 2> at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:436) ~[?:?]
[junit4] 2> at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow$2.getAsBoolean(TestReplicationHandlerDiskOverFlow.java:137)
~[test/:?]
[junit4] 2> at org.apache.solr.handler.IndexFetcher.deleteFilesInAdvance(IndexFetcher.java:1133)
~[java/:?]
[junit4] 2> at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1033)
~[java/:?]
[junit4] 2> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:575)
~[java/:?]
[junit4] 2> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)
~[java/:?]
[junit4] 2> at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:422)
~[java/:?]
[junit4] 2> at org.apache.solr.handler.ReplicationHandler.lambda$fetchIndex$0(ReplicationHandler.java:345)
~[java/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?]
...
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestReplicationHandlerDiskOverFlow
-Dtests.method=testDiskOverFlow -Dtests.seed=7A25E0BCF2F3D8D4 -Dtests.slow=true -Dtests.badapples=true
-Dtests.locale=kok -Dtests.timezone=MET -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 70.4s | TestReplicationHandlerDiskOverFlow.testDiskOverFlow <<<
[junit4] > Throwable #1: java.lang.AssertionError: threads encountered failures (see logs
for when) expected:<[]> but was:<[org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
[junit4] > request: http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2,
java.util.concurrent.TimeoutException]>
[junit4] > at __randomizedtesting.SeedInfo.seed([7A25E0BCF2F3D8D4:B017476C918DD2BB]:0)
[junit4] > at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.testDiskOverFlow(TestReplicationHandlerDiskOverFlow.java:189)
[junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[junit4] > at java.base/java.lang.Thread.run(Thread.java:834)
[junit4] 2> NOTE: leaving temporary files on disk at: /home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.handler.TestReplicationHandlerDiskOverFlow_7A25E0BCF2F3D8D4-001
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene80): {name=BlockTreeOrds(blocksize=128),
id=BlockTreeOrds(blocksize=128)}, docValues:{}, maxPointsInLeafNode=1340, maxMBSortInHeap=5.7037203451015515,
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@57ee8eda), locale=kok,
timezone=MET
[junit4] 2> NOTE: Linux 3.19.0-84-generic amd64/Oracle Corporation 11.0.2 (64-bit)/cpus=4,threads=2,free=205025280,total=260046848
[junit4] 2> NOTE: All tests run in this JVM: [TestReplicationHandlerDiskOverFlow]
[junit4] Completed [1/1 (1!)] in 72.25s, 1 test, 1 failure <<< FAILURES!

{noformat}

...note that the added logging in HttpSolrClient shows that we're evidently getting a valid
search result w/a 403 response code (did the query somehow already run successfully before
the server realized it should return 403?)

But re-running this same seed multiple times will frequently pass (ie: the exception message
makes it all the way to the client)

 

> SolrException msg not always propogated to HttpClient (may be specific to SOLR-12999
?)
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-13470
>                 URL: https://issues.apache.org/jira/browse/SOLR-13470
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13470.patch
>
>
> While working on some test hardening for SOLR-12999, I discovered a strange bug related
to how SolrExceptions are propogated to HttpClients -- sometimes the message set by the server
side code when throwing the SolrException is set in the remote exception recieved by the HttpSolrClient,
other times it is not.
> it's not clear to me if this is specific to the IndexFetcher related code (added in SOLR-12999)
that throw SolrExceptions when the index is in the middle of a full copy, or if it's a general
problem that can happen with any SolrException->HTTP->RemoteSolrException via HttpSolrClient
that only happens to manifests because of some quirk in the threading of TestReplicationHandlerDiskOverFlow.

> (perhaps because we don't have a lot of HTTP level tests checking the exception message?)
> At the moment, TestReplicationHandlerDiskOverFlow works around this issue by only comparing
the HTTP Staus code to ensure it's what's expected, w/o checking the getMessage() ... I'll
attach a patch that demonstrates how including a getMessage() assertion can (sporadically)
fail, and includes some nocommit debugging code i added to HttpSolrClient to try and make
sense of what's happening...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message