hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Telloli (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-5188) Modifications to enable multiple types of logging
Date Fri, 22 May 2009 13:24:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712022#action_12712022
] 

Luca Telloli edited comment on HADOOP-5188 at 5/22/09 6:24 AM:
---------------------------------------------------------------

Konstantin, thanks for your comments! Let me address each of your point separately: 

1. I opened HADOOP-5832 by mistake using the subtask option, without knowing that it would
have generated a new jira. My bad. 

Personally I think that there's no clear cut between the two jiras: with the URI processing
I'm introducing LoggingDevice, an abstraction for logging. Without it, HADOOP-5832 would be
trivial: you'd read a string, treat is as a URI, retrieve the path fragment and create a StorageDirectory.
So, unless you think it's necessary to have separate patches, I'd rather link them. 

2. I think edit streams are opened and closed many times during the life of a server process
(namenode/secondaryNN/backupnode) while the LoggingDevices are ideally created only once.
While this might not make a difference for files, which can be opened/closed many times, it
might fit better other types of logging which might want to store persistent information inside
their "LoggingDevice" implementation. For instance, when thinking about Bookkeeper, I want
to create a Bookkeeper client only once through the life of a NameNode and use it for Input/Output
streams when needed. Does this sound reasonable? 

3. From what I see in FSImage.loadFSEdits(StorageDirectory sd) EditLogFileInputStream needs
to access a StorageDirectory object. Assuming that we keep the LoggingDevice abstraction and
both edit streams (input and output) need to access it, it sounds better to leave it inside
FileLoggingDevice and retrieve it with the getNewInputStreamInstance() method. 

4. In the current patch you can specify values in dfs.name.dir as URI, as you can see in the
FSNamesystem.getStorageDirs(...) method, where arguments of type URI are processed as well
as path strings, both adding the correct value to the dirNames ArrayList. In general, values
of this property will be mapped directly to storage directories, since StorageDirectory is
the unit of storage for file system images. The mapping is different for the other property:
dfs.name.edits.dir, where the mapping is not one-to-one with StorageDirectories, since other
types of device can be used. 

      was (Author: lucat):
    Konstantin, thanks for your comments! Let me address each of your point separately: 

1. I opened HADOOP-5832 by mistake using the subtask option, without knowing that it would
have generated a new patch. My bad. But I think that there's no clear cut between the two
patches, because with URI processing I'm introducing logging devices which is something related
to logging abstraction. Without introducing logging devices, HADOOP-5832 would be trivial,
as you'd read a string, treat is as a URI and retrieve the path fragment, at least until other
types of logging are developed, unless I'm forgetting something. So, unless you think it's
necessary to have separate patches, I'd rather link them. 

2. I think edit streams are opened and closed many times during the life of a server process
(namenode/secondaryNN/backupnode) while the LoggingDevices are ideally created only once.
While this might not make a difference for files, which can be opened/closed many times, it
might fit better other types of logging which might want to store persistent information inside
their "LoggingDevice" implementation. For instance, when thinking about Bookkeeper, I want
to create a Bookkeeper client only once through the life of a NameNode and use it for Input/Output
streams when needed. Does this sound reasonable? 

3. From what I see in FSImage.loadFSEdits(StorageDirectory sd) EditLogFileInputStream needs
to access a StorageDirectory object. Assuming that we keep the LoggingDevice abstraction and
both edit streams (input and output) need to access it, it sounds better to leave it inside
FileLoggingDevice and retrieve it with the getNewInputStreamInstance() method. 

4. In the current patch you can specify values in dfs.name.dir as URI, as you can see in the
FSNamesystem.getStorageDirs(...) method, where arguments of type URI are processed as well
as path strings, both adding the correct value to the dirNames ArrayList. In general, values
of this property will be mapped directly to storage directories, since StorageDirectory is
the unit of storage for file system images. The mapping is different for the other property:
dfs.name.edits.dir, where the mapping is not one-to-one with StorageDirectories, since other
types of device can be used. 
  
> Modifications to enable multiple types of logging 
> --------------------------------------------------
>
>                 Key: HADOOP-5188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5188
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Luca Telloli
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5188.patch, HADOOP-5188.patch, HADOOP-5188.patch, HADOOP-5188.patch,
HADOOP-5188.pdf
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message