Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 3248 invoked from network); 11 Nov 2010 20:13:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Nov 2010 20:13:05 -0000 Received: (qmail 93890 invoked by uid 500); 11 Nov 2010 20:13:36 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 93854 invoked by uid 500); 11 Nov 2010 20:13:36 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 93846 invoked by uid 99); 11 Nov 2010 20:13:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 20:13:36 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 20:13:35 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oABKDFqo023951 for ; Thu, 11 Nov 2010 20:13:15 GMT Message-ID: <17500679.33941289506395648.JavaMail.jira@thor> Date: Thu, 11 Nov 2010 15:13:15 -0500 (EST) From: "Jakob Homan (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931182#action_12931182 ] Jakob Homan commented on HDFS-1448: ----------------------------------- Patch review for HDFS-1448-0.22-2.patch. * BinaryTokenizer.java ** Providing a constructor that takes a stream rather than a String could aid in testing (my goal is for all testing to be able to be done via streams without going to the file system). * EditsLoader.java ** printStatistics - does this method add value that couldn't otherwise be achieved as a separate viewer? I'm still not sold on providing this information for every run. The vast majority of oev instances won't be interested in it, but will still have to pay the penalty of compiling the statistics. Conversely, the information could be of interest specifically (ie, tell me about this edits log), and then the user will have to run some other viewer just get it. This same information can be gathered as a separate visitor, as mentioned in the first review. * EditsLoaderCurrent.java ** Comments on 157-160 should be moved one line down so they apply to the check they're describing. ** The prior review had asked for the various switch statements to be moved into separate functions to aid in readability, testing and code maintability. Does the new code, with its individual functors and the extra code necessary to implement this scheme provide any functionality not given by the original suggestion? If not, it seems to be a large amount of extra code and wiring without any benefit. The goal of the comment in the original review was to reduce complexity and improve readability, which I'm not sure this new approach accomplishes. * EditsOpCode.java ** The content of this file duplicates the constants created in FSEditLog.java. While it would be best to avoid all duplication, that may not be possible in this patch, as discussed above. However, to minimize duplication, I suggest refactoring out the constants from FSEditsLog into a separate class and referring to those constants in the enum definition. Bonus points would be to have the Enum in the same file as the constants. EditsVisitor.java ** What's the use case for the getTokenizer() method? It doesn't seem to be called anywhere. Unused methods should be removed. * EditsVisitorFactory.java ** Does the three lines of regular expression parsing necessary to determine if a file ends in .xml provide any extra benefit than simply using filename.endWith(".xml"), as was proposed in the original review? If not, we should prefer the shorter, simpler code. ** It may be good to support .XML, .Xml, etc., and therefore call toLower on the string before checking for the file extension. ** Typo: "different implementatios" * OfflineEditsViewer.java ** The only method calling public setEditsLoader() is OfflineEditsViewer is go(). As such, it should be made private. * Tokenizer.java ** Spacing between fields and methods does not follow our coding standard. ** Consensus on naming convention for tokens of Foo appears to be FooToken, rather than TokenFoo (see: [http://www.google.com/codesearch?hl=en&sa=N&q=file:^.*Token*.java]), as well as our own 'Token'y classes. We should follow that here. * TokenizerFactory.java ** Same comments for as for EditsFactory.java * XmlTokenizer.java ** A quick survey of our exception handling shows that it is preferable to nest exceptions rather than taking the message from one, swallowing it and throwing a new exception: http://dl.dropbox.com/u/565949/exceptions.txt We should do the same here. ** Do we need to handle all the empty cases in the switch? At the very least, there are a lot of empty comments that should be returned. ** {{public Token read(Token t) throws IOException {}} this method returns the same Token that it accepts, which has a bit of code smell. I wonder if there's a way to avoid the confusion of mutating the parameter and then returning it, or, if not, explicitly documenting this behavior. OfflineEditsViewerHelper.java ** As an aside: Whatever form elements of the edits file eventually take, it would be nice if they were self-testing and could provide this information automatically, rather than needing to call each one here, decoupled from the implementation. ** As noted in the first review, it is unnecessary and inefficient to shell out to copy the edits file. This file is used a source for the test; you can find where it is (as is done in order to accomplish the copying) and explicitly clean it up after the test has completed. I suggest refactoring {noformat}public void generateEdits(String dfsDir, String editsFilename){noformat} to {noformat}public String generateEdits(String dfsDir) // return path to edits{noformat} and providing a shutdown method on OfflineEditsViewer that cleans up the cluster when the unit test has completed. ** 137: No need for IOException to be on a separate line ** Nit: 208: Void type is preferable to Object for doAs blocks ** 218: Same comment about nesting exceptions (as well as anywhere else I may have missed). * TestOfflineEditsViewer.java ** Remove commented-out lines 27,28 ** Nit: Line 174, save a few characters with a while loop ** 134: Exception should be converted to an assert since this is a unit test > Create multi-format parser for edits logs file, support binary and XML formats initially > ---------------------------------------------------------------------------------------- > > Key: HDFS-1448 > URL: https://issues.apache.org/jira/browse/HDFS-1448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: tools > Affects Versions: 0.22.0 > Reporter: Erik Steffl > Assignee: Erik Steffl > Fix For: 0.22.0 > > Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf > > > Create multi-format parser for edits logs file, support binary and XML formats initially. > Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). > The binary format is the format used by FSEditLog class to read/write edits file. > Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). > Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.