hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-33) DF enhancement: performance and win XP support
Date Thu, 23 Feb 2006 17:02:40 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-33?page=comments#action_12367534 ] 

Doug Cutting commented on HADOOP-33:
------------------------------------

Fixing things to not call DF twice per heartbeat would be great.  But why do we need the DiskUsage
class?  Can't we just keep the DF instance and reuse it?  I don't see the advantage of wrapping
the DF inside another class.  It just adds code, and less code is better.  Also, the logic
of getRemaining() is duplicated after your patch.

Perhaps what's needed is a private getDF() method in FSDataset, that checks to see if a cached
DF instance has been refreshed in the last N milleseconds.  If it has not then it is refreshed.
 Then it is returned.  Something like:

private synchronized DF getDF() {
  long now = System.getMillisTime();
  if ((now - lastDfTime) > DF_INTERVAL) {
    df = new DF();
    lastDfTime = now;
  }
  return DF();
}

Then getRemaining() and getCapacity() can be defined in terms of getDF().  Does this make
sense?

Finally, Hadoop currently requires Cygwin in a number of places, most notably in the startup
scripts.  The current strategy is not to maintain native Windows versions of these, but rather
to rely on Cygwin.  This patch increases the code size without removing the dependency on
Cygwin.  If you like, we could start another bug to entirely remove the dependency on Cygwin,
porting all scripts, DF, etc.  But that is a low-priority item for me, since Cygwin offers
a fine solution with no code duplication.

In summary, I'd love to see a patch that fixes the DF problem with a minimum of code.  Thanks!

> DF enhancement: performance and win XP support
> ----------------------------------------------
>
>          Key: HADOOP-33
>          URL: http://issues.apache.org/jira/browse/HADOOP-33
>      Project: Hadoop
>         Type: Improvement
>   Components: fs, dfs
>  Environment: Unix, Cygwin, Win XP
>     Reporter: Konstantin Shvachko
>     Priority: Minor
>  Attachments: DF.patch, DFpatch.txt
>
> 1. DF is called twice for each heartbeat, which happens each 3 seconds.
> There is a simple fix for that in the attached patch.
> 2. cygwin is required to run df program in windows environment.
> There is a class org.apache.commons.io.FileSystemUtils, which can return disk free space
> for different OSs, but it does not have means to get disk capacity.
> In general in windows there is no efficient and uniform way to calculate disk capacity
> using a shell command.
> The choices are 'chkdsk' and 'defrag -a', but both of them are too slow to be called
> every 3 seconds.
> WinXP and 2003 server have a new tool called fsutil, which provides all necessary info.
> I implemented a call to fsutil in case df fails, and the OS is right.
> Other win versions should still run cygwin.
> I tested this fetaure for linux, winXP and cygwin.
> See attached patch.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message