incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Need help debugging mochiweb/Safari HTTP problems
Date Fri, 25 Jul 2008 01:18:48 GMT
Looks like I found a fix for the bug, though I'm not 100% sure what  
the actual bug is. The fix was to change mochiweb to send the HTTP  
chunk in a single gen_tcp:send/2 call. Previously it sent the length  
in one call, then the data followed by data in another call.

My theory of the bug is the Safari HTTP client is getting the chunked  
end marker in 2 packets.  It gets the 0 length + CRLF line in a one  
packet and when it asks for the CRLF in the next packet its not there  
yet, so it just skips it (for reasons yet unknown). But the CRLF is  
still coming, and then when the client goes ahead and makes the next  
request and tries to get the next response and it instead gets that  
previous CRLF it had skipped. Because it gets a weird unexpected  
response, it retries the request.

This fix is to put the whole chunk in one gen_tcp:send/2 call, which I  
think forces it into a single TCP packet and therefore CRLF is always  
available immediately. The fix is simple and I think it also be more  
efficient for most use cases. However I think there might still be a  
flaw in Safari here that could bite. I also think the idempotence work  
for document creation is still necessary.

I want to take this fix along with the recent replication and  
compaction bug fixes and create a 0.8.1.

-Damien


On Jul 23, 2008, at 3:01 PM, Damien Katz wrote:

> Right now we are having a major problem with HTTP request being  
> retried. This problem is responsible for the test suite failures  
> seen constantly in Safari (though others report similar failures in  
> Firefox, I've not seen them myself). And not just test suite  
> failures, some are seeing the same behavior in production.
>
> The major symptoms of this problem:
> 1. Mysterious conflict - You get a conflict error saving a document  
> to the db. When you examine the existing db document, it's already  
> got your changes.
> 2. Duplicate document - When creating a new document via POST, you  
> occasionally get 2 new documents created instead of one.
>
> #1 is annoying but not too serious, no data is lost or corrupted. #2  
> is a bit more dangerous, because you could consider the database  
> corrupted by having the duplicate document. (depends on what  
> problems it would cause for your app)
>
> What is happening in both these cases is the HTTP requests are  
> getting sent and processed twice. The first request is given to  
> CouchDB and is handled, but when CouchDB attempts to send the  
> response, the connection is reset (apparently). Then another  
> identical HTTP request comes in and the request is processed again.
>
> I am not a TCP expert. but by viewing the network requests via  
> tcpdump, it is obvious the request packets, 1 header and 1 body  
> packet, are getting resent from the client to the server. I do not  
> know if the packets are being resent at the TCP level, or if the  
> HTTP client in safari is retrying the request after getting a TCP  
> error.
>
> I do not know why the network error or subsequent resend is  
> happening. I can only confirm that it *is* happening. If this is at  
> the TCP level, then it means we definitely need to do away with the  
> non-idempotent POST to create new documents.
>
> I think we do anyway though. While this network error should not be  
> happening, it did expose an interesting problem with our use of POST  
> for document creation. The problem is the generated id for the  
> document is a UUID generated server side, so the server has no way  
> to distinguish if a request is a new request or a resend of an  
> already processed request, and so generates another UUID and thus  
> creates another new document. But if the UUID is generated by the  
> client, then the resend will cause a conflict error, that UUID  
> already exists in the DB, thus eliminating the duplicate data.
>
> However, we still need to figure out why this is happening in the  
> first place. Why is the connection being reset and why is the  
> request being retried?
>
> If anyone want to try to debug this, here is what I've been doing:
> 1. Run a packet sniffer for local port 5984 and start couchdb
> 2. Got to http://127.0.0.1/_utils/,  click the "Test Suite" link
> 3. Run the "basics" test manually until you see a "conflict error"  
> exception in test result. (This exception stops the test executing.  
> I don't try to debug other test failures, since the test keeps on  
> running after the failure)
> 4. The last few requests will be the duplicated requests. There is  
> information about the packets, but I don't know how to interpret it.
>
> Any help and input appreciated.
>
> -Damien


Mime
View raw message