beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2708) Decompressing bzip2 files with multiple "streams" only reads the first stream
Date Thu, 03 Aug 2017 06:08:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112243#comment-16112243
] 

ASF GitHub Bot commented on BEAM-2708:
--------------------------------------

GitHub user chamikaramj opened a pull request:

    https://github.com/apache/beam/pull/3678

    [BEAM-2708] Adds support for reading concatenated bzip2 files

    Adds tests for concatenated gzip and bzip2 files.
    
    Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually
hitting 'DummyReadTransform' and not testing this feature.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/beam pbzip2_test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3678
    
----
commit 40e1fbf1856190418d0c6c25c746037d4c109083
Author: chamikara@google.com <chamikara@google.com>
Date:   2017-08-03T05:49:33Z

    Adds support for reading concatenated bzip2 files.
    
    Adds tests for concatenated gzip and bzip2 files.
    
    Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually
hitting 'DummyReadTransform' and not testing this feature.

----


> Decompressing bzip2 files with multiple "streams" only reads the first stream
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-2708
>                 URL: https://issues.apache.org/jira/browse/BEAM-2708
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions, sdk-py
>            Reporter: Pablo Estrada
>            Assignee: Chamikara Jayalath
>             Fix For: 2.1.0, 2.2.0
>
>
> I'm not sure which components to file this against. A user has observed that pbzip2 files
are not being properly decompressed:
> https://stackoverflow.com/questions/45439117/google-dataflow-only-partly-uncompressing-files-compressed-with-pbzip2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message