hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Optimizing Disk I/O - does HDFS do anything ?
Date Tue, 13 Nov 2012 21:10:07 GMT
People are welcome to complement but I guess the answer is :
1) Hadoop is not running on windows (I am not sure if Microsoft made any
statement about the OS used for Hadoop on Azure.)
->
http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/
2) files are written in one go with big blocks. (And actually, the files
fragmentation is not the only issue. The many small files 'issue' is -in
the end- a data fragmentation issue too and has an impact to read
throughput.)

Bertrand Dechoux

On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <jayunit100@gmail.com> wrote:

> How does HDFS deal with optimization of file streaming?  Do data nodes
> have any optimizations at the disk level for dealing with fragmented files?
>  I assume not, but just curious if this is at all in the works, or if there
> are java-y ways of dealing with a long running set of files in an HDFS
> cluster.  MAybe, for example, data nodes could log the amount of time spent
> on I/O for certain files as a way of reporting wether or not
> defragmentation needed to be run on  a particular node in a cluster.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Mime
View raw message