crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-565) CSVInputFormat needs to be more defensive when configuring itself
Date Thu, 01 Oct 2015 12:29:27 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939754#comment-14939754
] 

Micah Whitacre commented on CRUNCH-565:
---------------------------------------

[~champgm], thanks for the pull.  Here's some feedback:

1. Why are you using Mockito for mocking the Configuration object vs just populating it?
2. If you really need to use Mockito you can use the Mock annotation to avoid doing that for
each test.
3. You could simplify the code by using the "get" methods that take in a default value[1].
 This way if the value is not set you can ensure the default and this will eliminate the case
of null.  It won't eliminate the case of someone intentionally setting an empty string but
should handle most of the non string values.

Also once you are done with the changes if we can get a patch vs pull that'd be helpful. 
The github pulls are to a mirror of the actual git repo so we can't really "merge" the pull.

[1] - https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#get(java.lang.String,
java.lang.String)

> CSVInputFormat needs to be more defensive when configuring itself
> -----------------------------------------------------------------
>
>                 Key: CRUNCH-565
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-565
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: mac champion
>            Assignee: mac champion
>            Priority: Minor
>              Labels: csv, csvparser
>
> It seems that some behavior has changed somewhere along the line where hadoop Configuration
is concerned. It is possible that a call to .get(OPTION) will return null. CSVInputFormat
does not handle that case gracefully:
> https://github.com/apache/crunch/blob/apache-crunch-0.10.0/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVInputFormat.java#L178-L183
> Some more relevant details can be found in this JIRA:
> https://issues.apache.org/jira/browse/CRUNCH-564?focusedCommentId=14938186&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14938186



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message