hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-775) Add input/output formatters for Vertica clustered ADBMS.
Date Tue, 08 Sep 2009 23:40:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752827#action_12752827
] 

Aaron Kimball commented on MAPREDUCE-775:
-----------------------------------------

Omer, 

This looks like a great start. I've read through the patch and believe I understand most of
how it works. I don't have any major architectural concerns, but there are a number of style
issues that I think should be addressed before this is committed. All of these are outlined
below. Comments are listed in the sequential order presented by your patch file.

(As for your question about test failures, build #511 has already been deleted by Hudson,
so I can't check that.)

ivy.xml:
Do you actually depend on hsqldb?


Hadoop test classes typically go in the same package as that which they test (e.g., {{o.a.h.vertica}}),
not a separate package like {{o.a.h.vertica.tests}}. This would save you a lot of imports
in tests, and allows package-public things to be used for testing. (This applies to all your
test classes)

AllTests.java
* The method name {{setUp()}} has special meaning in JUnit. Since your {{setup()}} method
isn't a {{setUp()}}, can you change this to something less misleadingly-similar?
* In test description string in suite(), o.a.h.vertica, not o.a.h.sqoop.

* re your "TODO: figure out jdbc jar packaging:" Based on Hadoop source tree style, I recommend
creating a {{src/contrib/vertica/lib}} directory, check any external jars you need in there,
and modify {{src/contrib/vertica/build.xml}} to include the jars from that dir on the classpath
for building, testing, etc. Since patches are text not binary, you should attach the jar to
the JIRA issue separately. Note that adding a jar requires its license be A2-compatible. See

[http://www.apache.org/legal/resolved.html#category-a] for a list of licenses which external
dependencies may have applied to them.

TestExample.Reduce.setup(): I suggest that {{AllTests.setup();}} go in a static initializer
block in {{TestExample}} rather than getting called in every {{Reduce.setup()}} call. Given
that you actually require {{AllTests.setup()}} in virtually all your tests, I would suggest
creating a {{VerticaTestCase}} class that subclasses {{TestCase}}, have this class call {{AllTests.setup()}}
in a static initializer block, and then have all your {{Test\*}} classes subclass {{VerticaTestCase}}
instead of {{TestCase}}. This way you won't worry about missing a call somewhere.

Also in this same method, why catch {{Exception e}} and print its stack trace? If {{Reduce.setup()}}
fails for an exception, why shouldn't the whole test fail?

TestExample.Reduce.reduce(): Style nit: One-line {{if}} statements should still use curly-braces
around the "then" clause. See [http://java.sun.com/docs/codeconv/html/CodeConventions.doc6.html#449].
Hadoop source should follow all Sun Java style conventions except for an indentation width
of two spaces.

I don't think {{TestExample}}, etc, should have a {{run()}} method.

TestVertica.testVerticaRecord(): why are values of {{DATE}}, {{TIME}}, etc, commented out?
Dead code should be removed, not commented out. Also, why catch the IOException and return?
Why doesn't this method just throw IOException (and implicitly fail the test)?

In {{recordTest()}}, don't use "assert values.equals(new_values)", use JUnit: {{assertEquals("failure
message", values, new_values);}}

Same with {{testVerticaSplit()}}, {{validateInput()}}, etc...


VerticaStreamingRecordWriter.java: Please use Java "lowerCamelCase" style for field and variable
names, not "under_scores" (see {{writer_table}}, {{copy_stmt}}, etc. These should be {{writerTable}}
and {{copyStmt}} respectively.)

In the constructor, the RuntimeException description {{"Vertica Formatter requies a the Vertica
jdbc driver"}} contains a bunch of typos.

close() method: if statement should use curly-brace style described above.

write() method: Materializing {{record.toString()}} in {{LOG.debug()}} for every call to write
is expensive. Consider wrapping this statement in a call to {{LOG.isDebugEnabled()}}.

VerticaConfiguration: comment above definition of {{DELIMITER}} has typos.


{{(input_query.charAt(input_query.length() - 1) == ';'}} ... perhaps {{inputQuery.endsWith(';');}}
?

{{getInputParameters()}} has meaningless javadoc attributes. See also getInputDelmiter() (which
is a typo'd method name), setInputDelimiter(), etc... that all have empty {{@return}} attributes.

VerticaInputFormat: {{DateFormat}} is not thread-safe. {{datefmt}} should not be a static
member.

This class also has a lot of {{under_score}} field and parameter names.

Can your javadoc comment for {{optimize()}} suggest when it is appropriate to call this, vs.
when you would be better off not doing so? What's the heuristic a programmer should keep in
mind?

This method also contains a lot of commented-out code. Please remove it entirely.

{{conn.wait(1000);}} should pull out {{1000}} into a static final constant, or even better,
make it configurable.

VerticaRecordWriter.getValue() has hairy braces in an if..else statement. (You do this in
{{write()}} as well.)

Also, what happens in the case where {{writer_table.split()}} returns a 0-length array? In
this same method, can you please add a comment explaining why you're pulling {{rs.getString(4)}}
and {{rs.getInt(5)}}? These seem arbitrary as-written.

VerticaInputSplit.executeQuery() has javadoc typos.

VerticaUtil uses tabs instead of spaces, and includes empty lines with leading whitespace.
Various block statements and curly braces also require reformatting here, as well as {{variable_names}}.

VerticaRecord constructor has meaningless javadoc attributes.
Also, please obey 80-column limit in this class (as well as elsewhere).

in {{objectTypes()}}, why not use {{else if}} statements instead of just a series of {{if}}
statements? You could then drop all the {{continue}} statements which make for awkward flow.
Also, include a case at the end for unknown type where you throw an exception, rather than
misalign the {{types}} ArrayList from the {{values}} ArrayList.

{{toSQLString()}}: Please do not start variable names with underscore. I suggest {{myDelimiter}}
to differentiate it from {{delimiter}}.

Also, are fall-thrus in the case block intentional? If so, please mark this with a comment.



> Add input/output formatters for Vertica clustered ADBMS.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-775
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-775
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Omer Trajman
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-775.patch
>
>
> Add native support for Vertica as an input or output format taking advantage of parallel
read and write properties of the DBMS.
>  
> On the input side allow for parametrized queries (a la prepared statements) and create
a split for each combination of parameters.  Also support the parameter list to be generated
from a sql statement.  For example - return metrics for all dimensions that meet criteria
X with one input split for each dimension.  Divide the read among any number of hosts in the
Vertica cluster.
>  
> On the output side, support Vertica streaming load to any number of hosts in the Vertica
cluster.  Output may be to a different cluster than input.
>  
> Also includes Input and Output formatters that support streaming interface.
> Code has been tested and run on live systems under 19 and 20.  Patch for 21 with new
API will be ready end of this week.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message