hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Telloli (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-5189) Integration with BookKeeper logging system
Date Wed, 08 Apr 2009 16:10:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697058#action_12697058
] 

Luca Telloli edited comment on HADOOP-5189 at 4/8/09 9:08 AM:
--------------------------------------------------------------

I'm posting a new preview version that addresses two features: 

- Logging on multiple devices 
- Writing IDs on Zookeeper (that is, no longer usage of files to write information)

I additionally moved EditLogFileOutputStream and EditLogFileInputStream out of the FSEditLog
class. 

A sample configuration is the following: 
<property>
            <name>dfs.name.dir</name>
            <value>/tmp/localhdfs</value>
</property>
<property>
            <name>dfs.name.edits.dir</name>
            <value>/tmp/hdfsedits</value>
</property>
<property>
        <name>hdfs.editlog</name>
        <value>FILE,BOOKKEEPER</value>
</property>

NOTE: The hdfs.editlog is a new property that has to be specified for this patch to work.


RUNNING ZOOKEEPER AND BOOKKEEPER EASILY
To run ZooKeeper and BookKeeper in one shot, there' a class in the bookkeeper .jar named 
org.apache.bookkeeper.util.LocalBookKeeper which can run a ZooKeeper along with a user-specified
number of BookKeepers.  

An example command is the following: 
java -cp lib/log4j-1.2.15.jar:lib/junit-3.8.1.jar:lib/zookeeper-dev.jar:lib/zookeeper-dev-bookkeeper.jar
org.apache.bookkeeper.util.LocalBookKeeper N 

where N is the number of Bookies to use (for instance 3) 

LOGGING ON MULTIPLE DEVICES
The initial semantic is very simple, and is the following: 
- when writing an operation, write sequentially to all types of logging 
- when reading operations (during the startup or checkpoint), read from the first logging
system; at the moment this is the first storage directory, so still file-based 

There's no fall-back mechanism implemented yet if the first logging system fails (the idea
would be to go with the next one and exclude the failed one from the array of streams). 

The current loadFSEdits(StorageDirectory) should eventually change to a loadFSEdits() where
no storage directory is needed. Maybe a loadFSEdits(EditLogInputStream) would be even better.

DRAWBACKS
Currently, storage directories can be of three types: IMAGE, EDITS and IMAGE_AND_EDITS, with
the last one being the default one. With this patch I exclude the IMAGE_AND_EDITS type, so
user are forced to use the dfs.name.dir and dfs.name.edits.dir to specify a directory for
IMAGE and a directory for EDITS, when using file logging. 

      was (Author: lucat):
    I'm posting a new preview version that addresses two features: 

- Logging on multiple devices 
- Writing IDs on Zookeeper (that is, no longer usage of files to write information)

I additionally moved EditLogFileOutputStream and EditLogFileInputStream out of the FSEditLog
class. 

A sample configuration is the following: 
<property>
            <name>dfs.name.dir</name>
            <value>/tmp/localhdfs</value>
</property>
<property>
            <name>dfs.name.edits.dir</name>
            <value>/tmp/hdfsedits</value>
</property>
<property>
        <name>hdfs.editlog</name>
        <value>FILE,BOOKKEEPER</value>
</property>

NOTE: The hdfs.editlog is a new property that has to be specified for this patch to work.


RUNNING ZOOKEEPER AND BOOKKEEPER EASILY
To run ZooKeeper and BookKeeper in one shot, there' a class in the bookkeeper .jar named 
org.apache.bookkeeper.util.LocalBookKeeper which can run a ZooKeeper along with a user-specified
number of BookKeepers.  

An example command is the following: 
java -cp lib/log4j-1.2.15.jar:lib/junit-3.8.1.jar:lib/zookeeper-dev.jar:lib/zookeeper-dev-bookkeeper.jar
org.apache.bookkeeper.util.LocalBookKeeper

LOGGING ON MULTIPLE DEVICES
The initial semantic is very simple, and is the following: 
- when writing an operation, write sequentially to all types of logging 
- when reading operations (during the startup or checkpoint), read from the first logging
system; at the moment this is the first storage directory, so still file-based 

There's no fall-back mechanism implemented yet if the first logging system fails (the idea
would be to go with the next one and exclude the failed one from the array of streams). 

The current loadFSEdits(StorageDirectory) should eventually change to a loadFSEdits() where
no storage directory is needed. Maybe a loadFSEdits(EditLogInputStream) would be even better.

DRAWBACKS
Currently, storage directories can be of three types: IMAGE, EDITS and IMAGE_AND_EDITS, with
the last one being the default one. With this patch I exclude the IMAGE_AND_EDITS type, so
user are forced to use the dfs.name.dir and dfs.name.edits.dir to specify a directory for
IMAGE and a directory for EDITS, when using file logging. 
  
> Integration with BookKeeper logging system
> ------------------------------------------
>
>                 Key: HADOOP-5189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5189
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: create.png, HADOOP-5189-trunk-preview.patch, HADOOP-5189-trunk-preview.patch,
HADOOP-5189.patch, HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records (https://issues.apache.org/jira/browse/ZOOKEEPER-276).
The NameNode is a natural target for such a system for being the metadata repository of the
entire file system for HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message