datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes" <matthew.terence.ha...@gmail.com>
Subject Re: Review Request 22351: DataFu MR: New lightweight framework to develop Java/Scala MapReduce jobs
Date Fri, 09 Jan 2015 22:00:13 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22351/#review67525
-----------------------------------------------------------


Sorry for taking so long to look at this.  I've started looking through this and have some
feedback below.  I like the idea of having a lightweight set of abstract jobs that make it
easier to write mapreduce jobs.  What is making this hard to review though is that there are
a lot of files copied from hourglass which are now stale compared to the most recent versions.
 Also seems to be other changes made to these files as well  It's really hard for me to tell
what has changed.  Also I'd like it if there was some unity of base classes between datafu-mr
and hourglass.  This introduces a new AbstractJob that is similar in some ways but different
than one defined in hourglass.  I'll pause on reviewing further until I hear back from you
regarding these issues :)


datafu-mr/README.md
<https://reviews.apache.org/r/22351/#comment111559>

    I don't know if the distinction between init() and configure() is clear.



datafu-mr/README.md
<https://reviews.apache.org/r/22351/#comment111554>

    Since the reducer class may be inferred, if we don't write one then does it make it map-only?
 Or do we need to override getReducerClass?



datafu-mr/README.md
<https://reviews.apache.org/r/22351/#comment111555>

    first folder => last folder
    
    Also how does it decide between lexicographic or date order?



datafu-mr/README.md
<https://reviews.apache.org/r/22351/#comment111557>

    Is this based off the name of the inner class or the parent class?  Doesn't a combiner
also implement the Reducer type so won't this confuse things?



datafu-mr/changes.md
<https://reviews.apache.org/r/22351/#comment111561>

    I think we can get rid of this changed.md file and track everything at the root level.



datafu-mr/src/main/java/datafu/mr/avro/AvroMultipleInputsKeyInputFormat.java
<https://reviews.apache.org/r/22351/#comment111562>

    While I'm thinking of it: all of the files copied from hourglass should be removed from
hourglass and hourglass should be updated to reference datafu-mr.  Also make sure you update
all these copied files.
    
    git log datafu-hourglass/src
    
    This one has a significant number of changes to javadocs:
    
    git show 0f9b853be2efd7176091c433bc08180aa81bc453
    
    Besides the package, are there any other changes to these files?  I'm assuming for now
that you did not make any changes, so I will skip over anything that exists already in hourglass.



datafu-mr/src/main/java/datafu/mr/avro/AvroMultipleInputsUtil.java
<https://reviews.apache.org/r/22351/#comment111563>

    Indicative that this is out of date.  Author tags were removed.  Just update the entire
file to be safe since there were other changes as well to docs.



datafu-mr/src/main/java/datafu/mr/avro/CombinedAvroMultipleInputsKeyInputFormat.java
<https://reviews.apache.org/r/22351/#comment111564>

    Is this a new file?  I can't find the original.  How does this differ from CombinedAvroKeyInputFormat?



datafu-mr/src/main/java/datafu/mr/fs/DatePath.java
<https://reviews.apache.org/r/22351/#comment111565>

    Was this changed? Where is the hashCode and equals method?



datafu-mr/src/main/java/datafu/mr/fs/PathUtils.java
<https://reviews.apache.org/r/22351/#comment111566>

    This seems to have changed from the one in hourglass.  There are method there that are
missing from this one.
    
    I think this would be much easier to review if there is a completely separate change that
moves the common files out of hourglass.  It is very hard to tell what has changed.



datafu-mr/src/main/java/datafu/mr/jobs/AbstractAvroJob.java
<https://reviews.apache.org/r/22351/#comment111567>

    remove author tag


- Matthew Hayes


On Nov. 5, 2014, 6:16 a.m., Mathieu Bastian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22351/
> -----------------------------------------------------------
> 
> (Updated Nov. 5, 2014, 6:16 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/DATAFU-51
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/DATAFU-51
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> New lightweight framework to develop Java/Scala MapReduce jobs
> 
> 
> Diffs
> -----
> 
>   datafu-mr/README.md PRE-CREATION 
>   datafu-mr/build.gradle PRE-CREATION 
>   datafu-mr/changes.md PRE-CREATION 
>   datafu-mr/overview.html PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/avro/AvroMultipleInputsKeyInputFormat.java PRE-CREATION

>   datafu-mr/src/main/java/datafu/mr/avro/AvroMultipleInputsUtil.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/avro/CombinedAvroKeyInputFormat.java PRE-CREATION

>   datafu-mr/src/main/java/datafu/mr/avro/CombinedAvroMultipleInputsKeyInputFormat.java
PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/avro/Schemas.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/avro/package-info.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/fs/DatePath.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/fs/PathUtils.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/fs/package-info.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/jobs/AbstractAvroJob.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/jobs/AbstractJob.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/jobs/StagedOutputJob.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/jobs/package-info.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/AvroHelper.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/DiscoveryHelper.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/DistributedCacheHelper.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/FileCleaner.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/IntermediateTypeHelper.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/LatestExpansionFunction.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/ReduceEstimator.java PRE-CREATION 
>   datafu-mr/src/main/java/datafu/mr/util/package-info.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/TestAbstractJob.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/TestAvroJob.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/TestAvroJoin.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/TestBase.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/TestIntermediateTypeHelper.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/TestLatestExpansionFunction.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/TestPathUtils.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/avro/KeyCount.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/AvroJoin.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroIntermediateObjectJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroIntermediateWritableJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroJob.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroMapOnlyJob.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroMapOnlyOutputObjectJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroMultipleInputsJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroMultipleOutputsJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroOutputObjectJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroOutputSpecificRecordJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroSpecificRecordInputJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroWordCountJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicAvroWordCountOverrideFormatJob.java
PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicConcatMultipleInputsJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicDistributedCacheClasspathJob.java
PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicDistributedCacheJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicMapOnlyJob.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicMultipleOutputsJob.java PRE-CREATION

>   datafu-mr/src/test/java/datafu/mr/test/jobs/BasicWordCountJob.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/util/BasicAvroReader.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/util/BasicAvroWriter.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/util/BasicWritableReader.java PRE-CREATION 
>   datafu-mr/src/test/java/datafu/mr/test/util/BasicWritableWriter.java PRE-CREATION 
>   settings.gradle 3e5d7bf 
> 
> Diff: https://reviews.apache.org/r/22351/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Mathieu Bastian
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message