lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1951) extractingUpdateHandler doesn't close socket handles promptly, and indexing load tests eventually run out of resources
Date Mon, 14 Jun 2010 14:57:14 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878602#action_12878602
] 

Karl Wright commented on SOLR-1951:
-----------------------------------

A site I found talks about this problem and potential solutions:

>>>>>>
First of all, are the TIME_WAITs client-side or server-side? If server-side, then you
need to redesign your protocol so that your clients initiate the active close of the
connection, whenever possible... (Except for the server occassionally booting
idle/hostile clients, etc...) Generally, a server will be handling clients from many
different machines, so it's far better to spread out the TIME_WAIT load among the
many clients, than it is to make the server bear the full load of them all...

If they're client side, it sounds like you just have a single client, then? And, it's making
a whole bunch of repeated one-shot connections to the server(s)? If so, then you
need to redesign your protocol to add a persistent mode of some kind, so your client
can just reuse a single connection to the server for handling multiple requests, without
needing to open a whole new connection for each one... You'll find your performance
will improve greatly as well, since the set-up/tear-down overhead for TCP is now
adding up to a great deal of your processing, in your current scheme...

However, if you persist in truly wanting to get around TIME_WAIT (and, I think it's a
horribly BAD idea to try to do so, and don't recommend ever doing it), then what you
want is to set "l_linger" to 0... That will force a RST of the TCP connection, thereby
bypassing the normal shutdown procedure, and never entering TIME_WAIT... But,
honestly, DON'T DO THIS! Even if you THINK you know WTF you're doing! It's
just not a good idea, ever... You risk data loss (because your close() of the socket
will now just throw away outstanding data, instead of making sure it's sent), you risk
corruption of future connections (due to reuse of ephemeral ports that would otherwise
be held in TIME_WAIT, if a wandering dup packet happens to show up, or something),
and you break a fundamental feature of TCP that's put there for a very good reason...
All to work around a poorly designed app-level protocol... But, anyway, with that
said, here's the FAQ page on SO_LINGER... 
<<<<<<

So, if this can be taken at face value, it would seem to argue that the massive numbers of
TIME_WAITs are the result of every document post opening and closing the socket connection
to the server, and that the best solution is to keep the socket connection alive for multiple
requests. Under http, and jetty, it's not clear yet whether it's possible to achieve that
goal.  But a little research should help.

If that doesn't work out, the SO_LINGER = 0 may well do the trick, but I think that might
require a change to jetty.


> extractingUpdateHandler doesn't close socket handles promptly, and indexing load tests
eventually run out of resources
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1951
>                 URL: https://issues.apache.org/jira/browse/SOLR-1951
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.4.1, 1.5
>         Environment: sun java
> solr 1.5 build based on trunk
> debian linux "lenny"
>            Reporter: Karl Wright
>         Attachments: solr-1951.zip
>
>
> When multiple threads pound on extractingUpdateRequestHandler using multipart form posting
over an extended period of time, I'm seeing a huge number of sockets piling up in the following
state:
> tcp6       0      0 127.0.0.1:8983          127.0.0.1:44058         TIME_WAIT
> Despite the fact that the client can only have 10 sockets open at a time, huge numbers
of sockets accumulate that are in this state:
> root@duck6:~# netstat -an | fgrep :8983 | wc
>   28223  169338 2257840
> root@duck6:~#
> The sheer number of sockets lying around seems to eventually cause commons-fileupload
to fail (silently - another bug) in creating a temporary file to contain the content data.
 This causes Solr to erroneously return a 400 code with "missing_content_data" or some such
to the indexing poster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message