hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp
Date Tue, 14 Mar 2006 20:28:40 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12370407 ] 

Doug Cutting commented on HADOOP-66:

Should we worry about the space this consumes?  For each block a client writes there will
be memory allocated containing the path name to the temporary file that cannot be gc'd.  If
that's 100 bytes, then 1M blocks would generate 100MB, which would be a big leak.  But writing
1M blocks means writing 32TB from a single JVM, which would take around a month (at current
dfs speeds).  If we increase the block size (as has been discussed) then the rate is slowed
proportionally.  So I guess we don't worry about this "leak"?

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>     Assignee: Doug Cutting
>      Fix For: 0.1
>  Attachments: no-tmp.patch, tmp-delete.patch
> The dfs client writes all the data for the current chunk to a file in /tmp, when the
chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast
when a lot of files are being written. A potentially better scheme is to buffer the written
data in RAM (application code can set the buffer size) and flush it to the Datanodes when
the buffer fills up.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message