hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sriram Rao" <srirams...@gmail.com>
Subject Announcing release of Kosmos Filesystem (KFS)
Date Fri, 28 Sep 2007 00:56:34 GMT

We are happy to announce the release of Kosmos Filesystem (KFS) as an
open source project.  KFS was designed and implemented at Kosmix Corp.

The initial release of KFS is version 0.1 (alpha).  The source code as
well as pre-compiled binares for x86-64-Linux-FC5 platforms is
available at the project page on Sourceforge

KFS is an available, distributed filesystem that is targeted towards
applications that are required to handle large amounts of data (such
as, grid computing, web search apps, mining apps etc).  KFS can be
used to virtualize storage on a cluster of commodity PCs.  A full
description of the project as well as set of features that are
implemented can be found at the following link:


KFS consists of 3 components:
 - a metadata server: that implements a global namespace
 - a set of chunkservers, that store data.  Blocks of a file, or
chunks, are stored on individual nodes; the size of each chunk is
fixed at 64MB
 - a client library that is linked with applications for accessing KFS.

KFS implemented in C++.  It also contains support for Java/Python applications.

In a nutshell,
 - KFS supports file replication, that is configurable on a per-file basis
 - Chunks of a file are replicated, typically, 3-way.  This is used to
provide data availability during chunkserver outages.
 - Re-replication is used to recover chunks that were lost due to
extended  chunkserver outages.
 - For data integrity, KFS stores check sums on data blocks, which are
verified on read; if corrruption is detected, re-replication is used
to recover the corrupted data
 - KFS supports incremental scalability; new storage nodes can be
added to the system
 - To enable better disk utilization, KFS  metaserver may periodically
rebalance the chunks by migrating chunks from "over-utilized" servers
to "under-utilized" servers.
 - KFS exports a standard filesystem API (such as, create, read,
write, etc.).  Files can be written to multiple times; KFS supports
append operation on files.

To enable applications to use KFS, KFS has been integrated with Hadoop
using Hadoop's filesystem interfaces (see Hadoop-Jira-1963).  This
enables existing Hadoop applications to use KFS seamlessly.

We are looking to build a user community for KFS.  If I can help in
anyway for you to evaluate KFS, please feel free to get in touch with

I'd also be happy to share any level of detail about KFS.

Thank you.


View raw message