hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-33) DF enhancement: performance and win XP support
Date Sat, 25 Feb 2006 02:39:39 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-33?page=comments#action_12367755 ] 

Konstantin Shvachko commented on HADOOP-33:

1. Having the lastDfTime, and updating DF every time DF_INTERVAL passes is definitely a good
I would go even further and place the DF renew/refresh logic directly into the DF class so
that functions
calling DF get-methods were free to assume the data is up to date.
I don't know whether we need that, but DF_INTERVAL can be made a configurable parameter.
This will bring in more code, but will make the use of the DF class easier in the end.
Do we want it?

2. My patch does not remove the dependency on Cygwin. What it does is
it removes dependency on Cygwin in one particular case without compromising performance for
the mainstream OS.
The whole file system can run (and actually runs) on windows after that without overheads
of cygwin.
Additional code is justifiable and inevitable in this case until Sun will implement this functionality
for us using native libraries :-).

3. What do you mean by minimizing the code?
Is it "the minimum of changes to the existing code that solve the problem", or is it
the minimal amount of total code committed to the repository?
Or is it minimizing the code required in the future to use the feature?
This is actually an interesting topic for discussion......

> DF enhancement: performance and win XP support
> ----------------------------------------------
>          Key: HADOOP-33
>          URL: http://issues.apache.org/jira/browse/HADOOP-33
>      Project: Hadoop
>         Type: Improvement
>   Components: fs, dfs
>  Environment: Unix, Cygwin, Win XP
>     Reporter: Konstantin Shvachko
>     Priority: Minor
>  Attachments: DF.patch, DFpatch.txt
> 1. DF is called twice for each heartbeat, which happens each 3 seconds.
> There is a simple fix for that in the attached patch.
> 2. cygwin is required to run df program in windows environment.
> There is a class org.apache.commons.io.FileSystemUtils, which can return disk free space
> for different OSs, but it does not have means to get disk capacity.
> In general in windows there is no efficient and uniform way to calculate disk capacity
> using a shell command.
> The choices are 'chkdsk' and 'defrag -a', but both of them are too slow to be called
> every 3 seconds.
> WinXP and 2003 server have a new tool called fsutil, which provides all necessary info.
> I implemented a call to fsutil in case df fails, and the OS is right.
> Other win versions should still run cygwin.
> I tested this fetaure for linux, winXP and cygwin.
> See attached patch.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message