lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: Repeat to the right list: Solr spewage and possible re-entrancy problem?
Date Mon, 14 Jun 2010 14:29:26 GMT
Thanks for the advice, but it seems for Solr 1.5 that the default pool has already been set
this way:

>>>>>>
    <!-- =========================================================== -->
    <!-- Server Thread Pool                                          -->
    <!-- =========================================================== -->
    <Set name="ThreadPool">
      <!-- Default bounded blocking threadpool
      -->
      <New class="org.mortbay.thread.BoundedThreadPool">
        <Set name="minThreads">10</Set>
        <Set name="lowThreads">50</Set>
        <Set name="maxThreads">10000</Set>
      </New>

      <!-- Optional Java 5 bounded threadpool with job queue
      <New class="org.mortbay.thread.concurrent.ThreadPool">
        <Arg type="int">0</Arg>
        <Set name="corePoolSize">10</Set>
        <Set name="maximumPoolSize">250</Set>
      </New>
      -->
    </Set>
<<<<<<

I am a bit concerned by the maxthreads value, which is pretty high, seems to me, but certainly
is nowhere near 28,000 nevertheless.

Karl

-----Original Message-----
From: ext Smiley, David W. [mailto:dsmiley@mitre.org] 
Sent: Monday, June 14, 2010 10:20 AM
To: dev@lucene.apache.org; simon.willnauer@gmail.com
Subject: RE: Repeat to the right list: Solr spewage and possible re-entrancy problem?

Maybe its related to this:
https://issues.apache.org/jira/browse/SOLR-1941
Simply switch to Jetty's BoundedThreadPool or try Tomcat.

~ David Smiley
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Monday, June 14, 2010 9:10 AM
To: dev@lucene.apache.org; simon.willnauer@gmail.com
Subject: RE: Repeat to the right list: Solr spewage and possible re-entrancy problem?

Hi Simon,

I have no doubt that TCP is working ok. ;-)  I have doubts that Solr is working reasonably,
however.

Since this is all localhost-localhost interaction, I doubt we are losing packets in my test
case.  So I think we can eliminate that possibility as a cause.

If the claim is that the server is delaying its socket close somehow, that's a problem worth
trying to prevent.  The number of sockets created eventually causes the process to run out
of file handles, and that could well cause the 400 errors, because commons-fileupload will
not be able to write the content to a temp file at that point.

It's actually impossible for my client to be leaking in this way, so I don't think that's
the issue.  There's a fixed set of threads, and each thread MUST close the socket it opens
before it can go on to the next request:

    Socket socket = createSocket();
    try
    {
	...
    } 
    finally
    {
      socket.close();
    }

So, if there's a close delay, it's got to be server-side.

Karl


-----Original Message-----
From: ext Simon Willnauer [mailto:simon.willnauer@googlemail.com] 
Sent: Monday, June 14, 2010 8:47 AM
To: dev@lucene.apache.org
Subject: Re: Repeat to the right list: Solr spewage and possible re-entrancy problem?

Hey Karl,

the TIME_WAIT states you see are ok from the TCP perspective. The end
that sends the first FIN goes into the TIME_WAIT state, because that
is the end that sends the final ACK. If the other end's FIN is lost,
or if the final ACK is lost, having the end that sends the first FIN
maintain state about the connection guarantees that it has enough
information to retransmit the final ACK.
The socket will stay in TIME_WAIT for 2*packet lifetime (2* because of
the roundtrip).

As long as SO_LINGER is enabled the close operation on a socket will
wait until all queued messages are send. See this:

“When enabled, a close(2) or shutdown(2) will not return until all
queued messages for the socket have been successfully sent or the
linger timeout has been reached. Otherwise, the call returns
immediately and the closing is done in the background. When the socket
is closed as part of exit(2), it always lingers in the background.”

By defautl I think this is enabled and in the tomcat case set to 25 seconds.

I am not sure if that helps you with your problem but you could try
setting it to a lower value or disable it completely.

simon

On Mon, Jun 14, 2010 at 2:20 PM,  <karl.wright@nokia.com> wrote:
> Good catch!
>
> root@duck6:~# netstat -an | fgrep :8983 | wc
>  28223  169338 2257840
> root@duck6:~#
>
> ... and here's an example:
>
> tcp6       0      0 127.0.0.1:8983          127.0.0.1:44058         TIME_WAIT
>
> So, once again, what causes this behavior?  How can I wind up with 28,000 socket connections
hanging around, if both my client and Solr are behaving properly and are closing connections
properly?
>
> (I suspect that the answer to my somewhat rhetorical question is, "this should not happen".
 But then the question becomes, "why IS it happening?")
>
> Karl
>
> -----Original Message-----
> From: Wright Karl (Nokia-S/Cambridge)
> Sent: Sunday, June 13, 2010 7:52 AM
> To: dev@lucene.apache.org
> Subject: RE: Repeat to the right list: Solr spewage and possible re-entrancy problem?
>
> Good idea.
>
> How would you prevent such a thing from occurring on the server?  Or would this be the
result of the client not doing something properly?
>
> Karl
>
> ________________________________________
> From: ext Lance Norskog [goksron@gmail.com]
> Sent: Saturday, June 12, 2010 11:55 PM
> To: dev@lucene.apache.org
> Subject: Re: Repeat to the right list: Solr spewage and possible re-entrancy    problem?
>
> There are situations where zombie sockets pile up at the server and
> keep zombie threads open. When this happens, check the total number of
> threads in the server JVM, and the total number of open or TIME_WAIT
> sockets. 'netstat -an | fgrep :8983' may find 2000 entries.
>
> Lance
>
> On Mon, Jun 7, 2010 at 7:35 AM,  <karl.wright@nokia.com> wrote:
>> Hi folks,
>>
>> This morning I was experimenting with using multiple threads while indexing
>> some 20,000,000 records worth of content.  In fact, my test spun up some 50
>> threads, and happily chugged away for a couple of hours before I saw the
>> following output from my test code:
>>
>>>>>>>>
>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>> index record 6469124
>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>> index record 6469551
>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>> index record 6470592
>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>> index record 6472454
>> java.net.SocketException: Connection reset
>>         at java.net.SocketInputStream.read(SocketInputStream.java:168)
>>         at HttpPoster.getResponse(HttpPoster.java:280)
>>         at HttpPoster.indexPost(HttpPoster.java:191)
>>         at ParseAndLoad$PostThread.run(ParseAndLoad.java:638)
>> <<<<<<
>>
>> Looking at the solr-side output, I see nothing interesting at all:
>>
>>>>>>>>
>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params={literal.nokia_longitude=9.78518981933594&literal.nokia_phone=%2B497971910474&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_district=Münster&literal.nokia_placerating=0&literal.id=6472724&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=1&literal.nokia_ppid=276u0wyw-c8cb7f4d6cd84a639a4e7d3570bf8814&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9985514322917&literal.nokia_postalcode=74405&literal.nokia_street=WeinhaldenstraÃe&literal.nokia_title=Dorfgemeinschaft+Münster+e.V.&literal.nokia_category=261}
>> status=0 QTime=1
>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params={literal.nokia_longitude=9.76717020670573&literal.nokia_phone=%2B497971950725&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=0&literal.id=6472737&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=13&literal.nokia_ppid=276u0wyw-d3bed6449fcb41b0adc50ae08e041f8d&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9974405924479&literal.nokia_fax=%2B497971950712&literal.nokia_postalcode=74405&literal.nokia_street=KochstraÃe&literal.nokia_title=BayWa+AG+Bau-+%26+Gartenmarkt&literal.nokia_category=194}
>> status=0 QTime=0
>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params={literal.nokia_longitude=9.77591044108073&literal.nokia_phone=%2B49797124009&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_district=Unterrot&literal.nokia_placerating=0&literal.id=6472739&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=28&literal.nokia_ppid=276u0wyw-d534d7a9235a4edf878d5e32a34bad8b&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9791788736979&literal.nokia_fax=%2B49797123431&literal.nokia_postalcode=74405&literal.nokia_street=HauptstraÃe&literal.nokia_title=Gastel+R.&literal.nokia_category=5}
>> status=0 QTime=1
>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params={literal.nokia_longitude=9.76935&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=5&literal.id=6472698&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=15&literal.nokia_ppid=276u0wyw-9544100e68d74162aff54783b9376134&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9981&literal.nokia_postalcode=74405&literal.nokia_street=KanzleistraÃe&literal.nokia_tag=Steuerberater&literal.nokia_tag=Business+%26+Service&literal.nokia_title=Consultis+GmbH&literal.nokia_category=215}
>> status=0 QTime=92
>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params={literal.nokia_longitude=9.77173970540364&literal.nokia_phone=%2B4979713238&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=0&literal.id=6472699&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=37&literal.nokia_ppid=276u0wyw-9600016fd0d248c9b442111838350f64&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9987182617188&literal.nokia_fax=%2B497971911639&literal.nokia_postalcode=74405&literal.nokia_street=KarlstraÃe&literal.nokia_title=Videothek,+5th+avenue+Peltekis+Apostolos&literal.nokia_category=5}
>> status=0 QTime=93
>> <<<<<<
>>
>> It is unlikely (but, of course, not out of the question) that this hiccup is
>> due to some reentrancy problem in my test code.  It is much more likely to
>> be some kind of a Solr multi-threaded race condition - especially since it
>> looks like a number of requests all failed at precisely the same time.  This
>> is a Solr 1.5 build from mid-late March, FWIW.  Does anyone know of an
>> extractingUpdateRequestHandler re-entrancy bug of this kind?
>>
>> Thanks,
>> Karl
>>
>>
>>
>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Mime
View raw message