lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Repeat to the right list: Solr spewage and possible re-entrancy problem?
Date Mon, 14 Jun 2010 13:24:57 GMT
Not sure if this is a client problem at all. It seems that your server
closes the connection first (left hand side of your netstat output)
and then sticks in TIME_WAIT. Even if you are on localhost that could
be an issue. Many applications had problems with TIME_WAIT I can
remember mod_proxy having strange problems with that.

I am not saying this is the reason I just though I mention it as it
could help figuring out whats going wrong here.

simon

On Mon, Jun 14, 2010 at 3:09 PM,  <karl.wright@nokia.com> wrote:
> Hi Simon,
>
> I have no doubt that TCP is working ok. ;-)  I have doubts that Solr is working reasonably,
however.
>
> Since this is all localhost-localhost interaction, I doubt we are losing packets in my
test case.  So I think we can eliminate that possibility as a cause.
>
> If the claim is that the server is delaying its socket close somehow, that's a problem
worth trying to prevent.  The number of sockets created eventually causes the process to
run out of file handles, and that could well cause the 400 errors, because commons-fileupload
will not be able to write the content to a temp file at that point.
>
> It's actually impossible for my client to be leaking in this way, so I don't think that's
the issue.  There's a fixed set of threads, and each thread MUST close the socket it opens
before it can go on to the next request:
>
>    Socket socket = createSocket();
>    try
>    {
>        ...
>    }
>    finally
>    {
>      socket.close();
>    }
>
> So, if there's a close delay, it's got to be server-side.
>
> Karl
>
>
> -----Original Message-----
> From: ext Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Monday, June 14, 2010 8:47 AM
> To: dev@lucene.apache.org
> Subject: Re: Repeat to the right list: Solr spewage and possible re-entrancy problem?
>
> Hey Karl,
>
> the TIME_WAIT states you see are ok from the TCP perspective. The end
> that sends the first FIN goes into the TIME_WAIT state, because that
> is the end that sends the final ACK. If the other end's FIN is lost,
> or if the final ACK is lost, having the end that sends the first FIN
> maintain state about the connection guarantees that it has enough
> information to retransmit the final ACK.
> The socket will stay in TIME_WAIT for 2*packet lifetime (2* because of
> the roundtrip).
>
> As long as SO_LINGER is enabled the close operation on a socket will
> wait until all queued messages are send. See this:
>
> “When enabled, a close(2) or shutdown(2) will not return until all
> queued messages for the socket have been successfully sent or the
> linger timeout has been reached. Otherwise, the call returns
> immediately and the closing is done in the background. When the socket
> is closed as part of exit(2), it always lingers in the background.”
>
> By defautl I think this is enabled and in the tomcat case set to 25 seconds.
>
> I am not sure if that helps you with your problem but you could try
> setting it to a lower value or disable it completely.
>
> simon
>
> On Mon, Jun 14, 2010 at 2:20 PM,  <karl.wright@nokia.com> wrote:
>> Good catch!
>>
>> root@duck6:~# netstat -an | fgrep :8983 | wc
>>  28223  169338 2257840
>> root@duck6:~#
>>
>> ... and here's an example:
>>
>> tcp6       0      0 127.0.0.1:8983          127.0.0.1:44058        
TIME_WAIT
>>
>> So, once again, what causes this behavior?  How can I wind up with 28,000 socket
connections hanging around, if both my client and Solr are behaving properly and are closing
connections properly?
>>
>> (I suspect that the answer to my somewhat rhetorical question is, "this should not
happen".  But then the question becomes, "why IS it happening?")
>>
>> Karl
>>
>> -----Original Message-----
>> From: Wright Karl (Nokia-S/Cambridge)
>> Sent: Sunday, June 13, 2010 7:52 AM
>> To: dev@lucene.apache.org
>> Subject: RE: Repeat to the right list: Solr spewage and possible re-entrancy problem?
>>
>> Good idea.
>>
>> How would you prevent such a thing from occurring on the server?  Or would this
be the result of the client not doing something properly?
>>
>> Karl
>>
>> ________________________________________
>> From: ext Lance Norskog [goksron@gmail.com]
>> Sent: Saturday, June 12, 2010 11:55 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Repeat to the right list: Solr spewage and possible re-entrancy    problem?
>>
>> There are situations where zombie sockets pile up at the server and
>> keep zombie threads open. When this happens, check the total number of
>> threads in the server JVM, and the total number of open or TIME_WAIT
>> sockets. 'netstat -an | fgrep :8983' may find 2000 entries.
>>
>> Lance
>>
>> On Mon, Jun 7, 2010 at 7:35 AM,  <karl.wright@nokia.com> wrote:
>>> Hi folks,
>>>
>>> This morning I was experimenting with using multiple threads while indexing
>>> some 20,000,000 records worth of content.  In fact, my test spun up some 50
>>> threads, and happily chugged away for a couple of hours before I saw the
>>> following output from my test code:
>>>
>>>>>>>>>
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6469124
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6469551
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6470592
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6472454
>>> java.net.SocketException: Connection reset
>>>         at java.net.SocketInputStream.read(SocketInputStream.java:168)
>>>         at HttpPoster.getResponse(HttpPoster.java:280)
>>>         at HttpPoster.indexPost(HttpPoster.java:191)
>>>         at ParseAndLoad$PostThread.run(ParseAndLoad.java:638)
>>> <<<<<<
>>>
>>> Looking at the solr-side output, I see nothing interesting at all:
>>>
>>>>>>>>>
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.78518981933594&literal.nokia_phone=%2B497971910474&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_district=Münster&literal.nokia_placerating=0&literal.id=6472724&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=1&literal.nokia_ppid=276u0wyw-c8cb7f4d6cd84a639a4e7d3570bf8814&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9985514322917&literal.nokia_postalcode=74405&literal.nokia_street=WeinhaldenstraÃe&literal.nokia_title=Dorfgemeinschaft+Münster+e.V.&literal.nokia_category=261}
>>> status=0 QTime=1
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.76717020670573&literal.nokia_phone=%2B497971950725&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=0&literal.id=6472737&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=13&literal.nokia_ppid=276u0wyw-d3bed6449fcb41b0adc50ae08e041f8d&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9974405924479&literal.nokia_fax=%2B497971950712&literal.nokia_postalcode=74405&literal.nokia_street=KochstraÃe&literal.nokia_title=BayWa+AG+Bau-+%26+Gartenmarkt&literal.nokia_category=194}
>>> status=0 QTime=0
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.77591044108073&literal.nokia_phone=%2B49797124009&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_district=Unterrot&literal.nokia_placerating=0&literal.id=6472739&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=28&literal.nokia_ppid=276u0wyw-d534d7a9235a4edf878d5e32a34bad8b&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9791788736979&literal.nokia_fax=%2B49797123431&literal.nokia_postalcode=74405&literal.nokia_street=HauptstraÃe&literal.nokia_title=Gastel+R.&literal.nokia_category=5}
>>> status=0 QTime=1
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.76935&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=5&literal.id=6472698&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=15&literal.nokia_ppid=276u0wyw-9544100e68d74162aff54783b9376134&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9981&literal.nokia_postalcode=74405&literal.nokia_street=KanzleistraÃe&literal.nokia_tag=Steuerberater&literal.nokia_tag=Business+%26+Service&literal.nokia_title=Consultis+GmbH&literal.nokia_category=215}
>>> status=0 QTime=92
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.77173970540364&literal.nokia_phone=%2B4979713238&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=0&literal.id=6472699&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=37&literal.nokia_ppid=276u0wyw-9600016fd0d248c9b442111838350f64&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9987182617188&literal.nokia_fax=%2B497971911639&literal.nokia_postalcode=74405&literal.nokia_street=KarlstraÃe&literal.nokia_title=Videothek,+5th+avenue+Peltekis+Apostolos&literal.nokia_category=5}
>>> status=0 QTime=93
>>> <<<<<<
>>>
>>> It is unlikely (but, of course, not out of the question) that this hiccup is
>>> due to some reentrancy problem in my test code.  It is much more likely to
>>> be some kind of a Solr multi-threaded race condition - especially since it
>>> looks like a number of requests all failed at precisely the same time.  This
>>> is a Solr 1.5 build from mid-late March, FWIW.  Does anyone know of an
>>> extractingUpdateRequestHandler re-entrancy bug of this kind?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message