beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1856) HDFSFileSink class do not use the same configuration in master and slave
Date Sat, 01 Apr 2017 09:38:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952145#comment-15952145
] 

ASF GitHub Bot commented on BEAM-1856:
--------------------------------------

GitHub user 397090770 opened a pull request:

    https://github.com/apache/beam/pull/2398

    [BEAM-1856]HDFSFileSink class do not use the same configuration in master and slave

    As described in [BEAM-1856](https://issues.apache.org/jira/browse/BEAM-1856),`HDFSFileSink`
class do not use the same configuration in master thread and slave thread.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/beam release-0.6.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2398
    
----
commit d375cfa126fd7be9eeeec34f39c2b9b856f324bf
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-07T21:08:34Z

    Update Beam version in the Maven archetypes

commit e6dd96ed86837bb0997f25d74322c463a72c8d5d
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-07T21:12:21Z

    Update dataflow container version to beam-0.6.0

commit 1c91d0e598c5dd62a9c34cc0bf4bfcc9a6f2faae
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-07T21:44:54Z

    [maven-release-plugin] prepare release v0.6.0-RC1

commit 8ab36fa2d5dfc237c7619c278848e31f3412a0e1
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-08T21:37:57Z

    Revert "[maven-release-plugin] prepare release v0.6.0-RC1"
    
    This reverts commit 1c91d0e598c5dd62a9c34cc0bf4bfcc9a6f2faae.

commit d5257658f094fe8c2a8668027bbdd4a26396ba0b
Author: Ismaël Mejía <iemejia@gmail.com>
Date:   2017-03-06T08:13:31Z

    Change Json parsing from gson to jackson for ElasticsearchIO

commit 730def77ebf216295d953b13722f21185f674ccc
Author: Mark Liu <markliu@google.com>
Date:   2017-03-07T22:26:37Z

    [BEAM-1646] Remove duplicated bigquery dependency

commit 92c5b5bd732d9fc019fa6820afcc31a92a026bbf
Author: Davor Bonaci <davor@google.com>
Date:   2017-03-07T19:57:38Z

    Add ServicesResourceTransformer to all shading configuration
    
    This ensures that files in META-INF/services aren't overwritten. Instead, they are concatenated.
    
    This is critical to ensure PipelineOptionsRegistrar, RunnerRegistrar, IOChannelFactoryRegistrar
    and FileSystemRegistrar work well for users.

commit 5e2afa29a3a0fe93e662b2fe7173c1641c253cd5
Author: Kenneth Knowles <klk@google.com>
Date:   2017-03-02T22:29:56Z

    Explicitly GBK before stateful ParDo in Dataflow batch

commit 3518b12fbfc984fcfe12e12ba06809e57744f006
Author: Tibor Kiss <tibor.kiss@gmail.com>
Date:   2017-03-08T11:18:39Z

    [BEAM-1649] Fix unresolved references in Python SDK

commit f572328ce23e70adee8001e3d10f1479bd9a380d
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-08T21:42:36Z

    Update dataflow container version to beam-0.6.0

commit edaf3ac9f57c208bb7ce444d409b0909ef2d1a67
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-08T21:43:28Z

    Update py-sdk version to release version.

commit b25a0369dc5e5e4eacdc6e40b1e81452f9685579
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-08T22:07:45Z

    This closes #2202

commit 99be42e5e6ea5438d80b66d3345615a2754f2ed9
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-08T23:12:32Z

    [maven-release-plugin] prepare branch release-0.6.0

commit e1ea41274bab5347eac14a299231d97d239c924c
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-08T23:13:40Z

    Revert "[maven-release-plugin] prepare branch release-0.6.0"
    
    This reverts commit 99be42e5e6ea5438d80b66d3345615a2754f2ed9.

commit d2d4f0aad805b4e06371cb1ad6c29a6183236f7b
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-09T00:32:54Z

    [maven-release-plugin] prepare release v0.6.0-RC1

commit dc64c2fc0487e3549f48a710b8a7c4fb7bd0c788
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-09T00:34:41Z

    [maven-release-plugin] rollback changes from release preparation of v0.6.0-RC1

commit a18b5b1648489f14fd7a621f345e4d21c09b437f
Author: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Date:   2017-03-10T07:29:27Z

    Move GC timer checking to StatefulDoFnRunner.CleanupTimer

commit 8fa718db5bc14efd1beefc2c757c331a5bdbf927
Author: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Date:   2017-03-10T10:07:00Z

    Introduce Flink-specific state GC implementations
    
    We now set the GC timer for window.maxTimestamp() + 1 to ensure that a
    user timer set for window.maxTimestamp() still has all state.
    
    This also adds tests for late data dropping and state GC specifically
    for the Flink DoFnOperator.

commit 86522157a79fd9a753436312ff8b746cb5740135
Author: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Date:   2017-03-10T14:25:26Z

    Properly deal with late processing-time timers

commit 2b92b0d851bcc5aedcc40ebf02ad4f39f3d67514
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-10T21:42:17Z

    Add README to python tarball.
    
    And, delete test created files, to avoid them being included in the tarball.

commit c7c4da28b7925de38e7c10fcc4e9ef52a5ea76fc
Author: Kenneth Knowles <klk@google.com>
Date:   2017-03-10T21:14:58Z

    Remove duplicated dependency from Dataflow runner pom.xml

commit ef47c9f511b0f5730b0dc417aefa703fd6f974c5
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-11T00:21:17Z

    Ignore results from the tox clean up phase
    
    Some temporary files are generated only under certain conditions and
    this should not fail tox.

commit bce94c4e06bf2ee2428e7b6fa71d0d9144b7ee61
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-11T00:40:34Z

    Generate zip distribution for pyhthon

commit 7321c9afc5aeb3b786584bfe4b145cc3bf639830
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-11T01:41:21Z

    [maven-release-plugin] prepare release v0.6.0-RC2

commit ebc2ba5bf4cc368b25a9cd6131175bac3afffe13
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-11T01:46:44Z

    Revert "[maven-release-plugin] prepare release v0.6.0-RC2"
    
    This reverts commit 7321c9afc5aeb3b786584bfe4b145cc3bf639830.

commit dc4acfdd1bb30a07a9c48849f88a67f60bc8ff08
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-11T02:09:10Z

    [maven-release-plugin] prepare release v0.6.0-RC2

commit 11d8a67c0348e72eab34616b962e686f61d6a4eb
Author: Ahmet Altay <altay@google.com>
Date:   2017-03-11T02:10:06Z

    [maven-release-plugin] rollback changes from release preparation of v0.6.0-RC2

----


> HDFSFileSink class do not use the same configuration in master and slave
> ------------------------------------------------------------------------
>
>                 Key: BEAM-1856
>                 URL: https://issues.apache.org/jira/browse/BEAM-1856
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>    Affects Versions: 0.6.0
>            Reporter: yangping wu
>            Assignee: Davor Bonaci
>
> I have a code snippet as follow:
> {code}
> Read.Bounded<KV<LongWritable, Text>> from = Read.from(HDFSFileSource.from(options.getInputFile(),
TextInputFormat.class, LongWritable.class, Text.class));
> PCollection<KV<LongWritable, Text>> data = p.apply(from);
> data.apply(MapElements.via(new SimpleFunction<KV<LongWritable, Text>, String>()
{
>     @Override
>     public String apply(KV<LongWritable, Text> input) {
>         return input.getValue() + "\t" + input.getValue();
>     }
> })).apply(Write.to(HDFSFileSink.<String>toText(options.getOutputFile())));
> {code}
> and submit job like this:
> {code}
> spark-submit --class org.apache.beam.examples.WordCountHDFS --master yarn-client   \
>              ./target/word-count-beam-bundled-0.1.jar                              \
>              --runner=SparkRunner                                                  \
>              --inputFile=hdfs://master/tmp/input/                                  \
>              --outputFile=/tmp/output/
> {code}
> Then {{HDFSFileSink.validate}} function will check whether the local filesystem (not
HDFS) exists {{/tmp/output/}} directory.
> But the final result will store in {{hdfs://master/tmp/output/}} directory in HDFS filesystem.
> The reason is {{HDFSFileSink}} class do not use the same configuration in master thread
and slave thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message