hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
Date Thu, 11 Nov 2010 20:13:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931182#action_12931182

Jakob Homan commented on HDFS-1448:

Patch review for HDFS-1448-0.22-2.patch.

* BinaryTokenizer.java
** Providing a constructor that takes a stream rather than a String could aid in testing (my
goal is for all testing to be able to be done via streams without going to the file system).
* EditsLoader.java
** printStatistics - does this method add value that couldn't otherwise be achieved as a separate
viewer? I'm still not sold on providing this information for every run.  The vast majority
of oev instances won't be interested in it, but will still have to pay the penalty of compiling
the statistics. Conversely, the information could be of interest specifically (ie, tell me
about this edits log), and then the user will have to run some other viewer just get it. 
This same information can be gathered as a separate visitor, as mentioned in the first review.
* EditsLoaderCurrent.java
** Comments on 157-160 should be moved one line down so they apply to the check they're describing.
** The prior review had asked for the various switch statements to be moved into separate
functions to aid in readability, testing and code maintability.  Does the new code, with its
individual functors and the extra code necessary to implement this scheme provide any functionality
not given by the original suggestion?  If not, it seems to be a large amount of extra code
and wiring without any benefit.  The goal of the comment in the original review was to reduce
complexity and improve readability, which I'm not sure this new approach accomplishes.
* EditsOpCode.java
** The content of this file duplicates the constants created in FSEditLog.java.  While it
would be best to avoid all duplication, that may not be possible in this patch, as discussed
above.  However, to minimize duplication, I suggest refactoring out the constants from FSEditsLog
into a separate class and referring to those constants in the enum definition.  Bonus points
would be to have the Enum in the same file as the constants.
** What's the use case for the getTokenizer() method? It doesn't seem to be called anywhere.
 Unused methods should be removed.
* EditsVisitorFactory.java
** Does the three lines of regular expression parsing necessary to determine if a file ends
in .xml provide any extra benefit than simply using filename.endWith(".xml"), as was proposed
in the original review?  If not, we should prefer the shorter, simpler code.
** It may be good to support .XML, .Xml, etc., and therefore call toLower on the string before
checking for the file extension.
** Typo: "different implementatios"
* OfflineEditsViewer.java
** The only method calling public setEditsLoader() is OfflineEditsViewer is go().  As such,
it should be made private.
* Tokenizer.java
** Spacing between fields and methods does not follow our coding standard.
** Consensus on naming convention for tokens of Foo appears to be FooToken, rather than TokenFoo
(see: [http://www.google.com/codesearch?hl=en&sa=N&q=file:^.*Token*.java]), as well
as our own 'Token'y classes.  We should follow that here. 
* TokenizerFactory.java
** Same comments for as for EditsFactory.java
* XmlTokenizer.java
** A quick survey of our exception handling shows that it is preferable to nest exceptions
rather than taking the message from one, swallowing it and throwing a new exception: http://dl.dropbox.com/u/565949/exceptions.txt
 We should do the same here.
** Do we need to handle all the empty cases in the switch? At the very least, there are a
lot of empty comments that should be returned.
** {{public Token read(Token t) throws IOException {}} this method returns the same Token
that it accepts, which has a bit of code smell.  I wonder if there's a way to avoid the confusion
of mutating the parameter and then returning it, or, if not, explicitly documenting this behavior.
** As an aside: Whatever form elements of the edits file eventually take, it would be nice
if they were self-testing and could provide this information automatically, rather than needing
to call each one here, decoupled from the implementation.
** As noted in the first review, it is unnecessary and inefficient to shell out to copy the
edits file.  This file is used a source for the test; you can find where it is (as is done
in order to accomplish the copying) and explicitly clean it up after the test has completed.
I suggest refactoring
  {noformat}public void generateEdits(String dfsDir, String editsFilename){noformat}
  {noformat}public String generateEdits(String dfsDir) // return path to edits{noformat}
and providing a shutdown method on OfflineEditsViewer that cleans up the cluster when the
unit test has completed.
** 137: No need for IOException to be on a separate line
** Nit: 208: Void type is preferable to Object for doAs blocks
** 218: Same comment about nesting exceptions (as well as anywhere else I may have missed).
* TestOfflineEditsViewer.java
** Remove commented-out lines 27,28
** Nit: Line 174, save a few characters with a while loop
** 134: Exception should be converted to an assert since this is a unit test

> Create multi-format parser for edits logs file, support binary and XML formats initially
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-1448
>                 URL: https://issues.apache.org/jira/browse/HDFS-1448
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.22.0
>            Reporter: Erik Steffl
>            Assignee: Erik Steffl
>             Fix For: 0.22.0
>         Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22.patch,
Viewer hierarchy.pdf
> Create multi-format parser for edits logs file, support binary and XML formats initially.
> Parsing should work from any supported format to any other supported format (e.g. from
binary to XML and from XML to binary).
> The binary format is the format used by FSEditLog class to read/write edits file.
> Primary reason to develop this tool is to help with troubleshooting, the binary format
is hard to read and edit (for human troubleshooters).
> Longer term it could be used to clean up and minimize parsers for fsimage and edits files.
Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer.
Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in
both FSImage and FSEditLog. This is subject to change, specifically depending on adoption
of avro (which would completely change how objects are serialized as well as provide ways
to convert files to different formats).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message