Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: neutral (asf.osuosl.org: local policy)
Message-ID: <44B76855.7040406@apache.org>
Date: Fri, 14 Jul 2006 12:48:05 +0300
From: Doug Cutting <cutting@apache.org>
User-Agent: Thunderbird 1.5.0.4 (X11/20060615)
MIME-Version: 1.0
To: hadoop-user@lucene.apache.org
Subject: Re: DFS question: does append-only means faster updates ?
References: <20060713062302.84781.qmail@web34306.mail.mud.yahoo.com>
In-Reply-To: <20060713062302.84781.qmail@web34306.mail.mud.yahoo.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

drwho wrote:
> I've always wondered if a lack of overwrite / random-write op means that updates are much faster than convention filesystems..

Not really, since DFS is implemented on top of the ordinary filesystem, 
it's never any faster at serial access.  What it adds is scalability 
(petabytes in a single namespace) and reliability (continuous access to 
data through disk and host failures) and distributed performance (1000 
hosts reading or writing in parallel to the same logical FS).

> The fact that both (dfs, gfs) support delete op, does it mean that
> fragmentation will still be a big problem ?

Fragmentation should not be a problem, since files are chunked into 
128MB blocks stored in local filesystems.

> Also, would the lack of overwrite / random-write ops mean that the filesystem is less suitable for apps like online word-processor or even online spreadsheet / database ?

Yes, such applications are probably not appropriate for direct 
implementation on top of DFS.  It would work, but it would not be the 
best utilization of resources.  Google uses BigTable, layered on top of 
GFS, to store small items that may be independently updated.  Hadoop may 
someday incorporate something like BigTable.  Mike Cafarella has 
discussed this a bit on the hadoop-dev list:

http://www.mail-archive.com/hadoop-dev@lucene.apache.org/msg01415.html
http://www.mail-archive.com/hadoop-dev@lucene.apache.org/msg01443.html

Doug


Doug