couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filipe Manana (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-558) Validate Content-MD5 request headers on uploads
Date Mon, 16 Nov 2009 13:23:39 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778336#action_12778336
] 

Filipe Manana commented on COUCHDB-558:
---------------------------------------

Hum,

The mochiweb_multipart:parse_headers function will call mochiweb_utils:parse_header which
expects the header values to be of the form "X; Y=Z" as far as I understood from the source
and the test method:

test_parse_header() ->
    {"multipart/form-data", [{"boundary", "AaB03x"}]} =
        parse_header("multipart/form-data; boundary=AaB03x"),
    ok.

I've just discovered this now: http://www.erlang.org/doc/man/erlang.html#decode_packet-3
Maybe if we pass the full trailer binary, it will be able to decode it as an http header.
To be tested.

Regarding the integrity checks of chunked requests, I just had an idea (but complicated, with
a poor performance and incomplete):

1) In the ChunksFun, as soon as the amount of data (sum of the length of the chunks received
so far) reaches a certain value X, we start putting the chunks in a temporary file. The name
of the file is put in the current state #httpd{} record.

2) After receiving the whole request, compute the MD5 digest and compare it to the given digest.
If they do not match, remove the tmp file and the file name entry in #httpd{}.

3) The update_req/2 function will no longer replace the chunked http request with a non-chunked
http request.

4) Modify the recv_chunked function (couch_httpd.erl) to check if the given #httpd record
as a tmp file name in it. If so, it will read each chunk from it and pass it to given ChunkFun
callback.

Major obvious problems:

1) too complicated solution
2) poor disk performance if we have large requests and many in parallel
3) after crashes, we risk having useless tmp files lying around on disk
4) to compute the md5 digest, we still need to read the whole content ("unchunked") into memory.
Do you now of any "incremental" MD5 digest implementation? I've never heard about it.

Have you come up with any idea of this sort?

Possibly it might be interesting to check how httpd servers like Apache deal with this situation
(if they do, or they just buffer all the chunks in memory).

cheers

> Validate Content-MD5 request headers on uploads
> -----------------------------------------------
>
>                 Key: COUCHDB-558
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-558
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, HTTP Interface
>            Reporter: Adam Kocoloski
>             Fix For: 0.11
>
>         Attachments: jira-couchdb-558-for-trunk-2nd-try.patch, jira-couchdb-558-for-trunk-3rd-try.patch,
jira-couchdb-558-for-trunk.patch, run.tpl.patch
>
>
> We could detect in-flight data corruption if a client sends a Content-MD5 header along
with the data and Couch validates the MD5 on arrival.
> RFC1864 - The Content-MD5 Header Field
> http://www.faqs.org/rfcs/rfc1864.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message