hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-50) dfs datanode should store blocks in multiple directories
Date Thu, 30 Mar 2006 04:37:04 GMT
don't see why we should need a test to prove the need for this.  It  
is a clear scaling problem with a simple solution deployed by many,  
many systems.  Time tested too.  Let's just make sure we don't blow  
people's data away on the upgrade!

On Mar 29, 2006, at 1:23 AM, Andrzej Bialecki (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/HADOOP-50? 
> page=comments#action_12372204 ]
> Andrzej Bialecki  commented on HADOOP-50:
> -----------------------------------------
> That would be very useful. I've seen similar solutions in many  
> places (e.g. squid, or Mozilla cache dir).
> Currently, each time a block report is sent we need to list this  
> huge dir. That's still ok, it's infrequent enough. However, each  
> time we need to access a block, a correct file needs to be open.  
> Inside the native code JVM uses an open(2) call, which causes the  
> OS to perform a name-to-inode lookup. Even though OS is caching  
> partial results of this lookup (in Linux this is known as dcache/ 
> dentries), still depending on the size of this LRU cache and the FS  
> implementation details, doing real lookups for e.g. new blocks or  
> newly requested blocks may take a long time.
> Having said that, I'm not sure what would be the real performance  
> benefit of this change, perhaps you could come up with a simpler  
> test first...?
>> dfs datanode should store blocks in multiple directories
>> --------------------------------------------------------
>>          Key: HADOOP-50
>>          URL: http://issues.apache.org/jira/browse/HADOOP-50
>>      Project: Hadoop
>>         Type: Bug
>>   Components: dfs
>>     Versions: 0.2
>>     Reporter: Doug Cutting
>>     Assignee: Mike Cafarella
>>      Fix For: 0.2
>> The datanode currently stores all file blocks in a single  
>> directory.  With 32MB blocks and terabyte filesystems, this will  
>> create too many files in a single directory for many filesystems.   
>> Thus blocks should be stored in multiple directories, perhaps even  
>> a shallow hierarchy.
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira

View raw message