hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <li...@holsman.net>
Subject Re: Announcing release of Kosmos Filesystem (KFS)
Date Mon, 01 Oct 2007 04:50:14 GMT
Hi Sriram,

seeing how this is the hadoop user list, I would be particularly 
interested in knowing what differentiates KFS from Hadoop.

for example how would I be better off using KFS (as opposed to hadoop) 
for my web search app, or what does it do better than hadoop (and what 
does it do worse?)

what is the fundamental difference between KFS and hadoop such that 2 
seperate projects are required?


Sriram Rao wrote:
> Greetings!
> We are happy to announce the release of Kosmos Filesystem (KFS) as an
> open source project.  KFS was designed and implemented at Kosmix Corp.
> The initial release of KFS is version 0.1 (alpha).  The source code as
> well as pre-compiled binares for x86-64-Linux-FC5 platforms is
> available at the project page on Sourceforge
> (http://kosmosfs.sourceforge.net)
> KFS is an available, distributed filesystem that is targeted towards
> applications that are required to handle large amounts of data (such
> as, grid computing, web search apps, mining apps etc).  KFS can be
> used to virtualize storage on a cluster of commodity PCs.  A full
> description of the project as well as set of features that are
> implemented can be found at the following link:
> http://kosmosfs.sourceforge.net
> KFS consists of 3 components:
>  - a metadata server: that implements a global namespace
>  - a set of chunkservers, that store data.  Blocks of a file, or
> chunks, are stored on individual nodes; the size of each chunk is
> fixed at 64MB
>  - a client library that is linked with applications for accessing KFS.
> KFS implemented in C++.  It also contains support for Java/Python applications.
> In a nutshell,
>  - KFS supports file replication, that is configurable on a per-file basis
>  - Chunks of a file are replicated, typically, 3-way.  This is used to
> provide data availability during chunkserver outages.
>  - Re-replication is used to recover chunks that were lost due to
> extended  chunkserver outages.
>  - For data integrity, KFS stores check sums on data blocks, which are
> verified on read; if corrruption is detected, re-replication is used
> to recover the corrupted data
>  - KFS supports incremental scalability; new storage nodes can be
> added to the system
>  - To enable better disk utilization, KFS  metaserver may periodically
> rebalance the chunks by migrating chunks from "over-utilized" servers
> to "under-utilized" servers.
>  - KFS exports a standard filesystem API (such as, create, read,
> write, etc.).  Files can be written to multiple times; KFS supports
> append operation on files.
> To enable applications to use KFS, KFS has been integrated with Hadoop
> using Hadoop's filesystem interfaces (see Hadoop-Jira-1963).  This
> enables existing Hadoop applications to use KFS seamlessly.
> We are looking to build a user community for KFS.  If I can help in
> anyway for you to evaluate KFS, please feel free to get in touch with
> me.
> I'd also be happy to share any level of detail about KFS.
> Thank you.
> Sriram

View raw message