beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chamikara Jayalath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-778) Make fileio._CompressedFile seekable.
Date Wed, 29 Mar 2017 19:05:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947718#comment-15947718
] 

Chamikara Jayalath commented on BEAM-778:
-----------------------------------------

Currently this is not an issue since Beam FileBasedSoure and FileBasedSink are the only users
of CompressedFile/File objects and they are used in a pretty straightforward way where each
FileBasedSource/FileBasedSink object owns it's File/CompressedFile object and reading is done
using a single thread. A secondary thread that performs dynamic work rebalancing might execute
seek() operations for File objects but not for CompressedFile objects.

In the future we might have other places where we access CompressedFile objects using multiple
thread but I think we should probably wait till such needs arise. Also it might be enough
to declare CompressedFile objects to be not thread safe and expect users to address thread
safety instead of embedding a lock in CompressedFile objects which would potentially add a
performance penalty for all users.

WDYT ? 

> Make fileio._CompressedFile seekable.
> -------------------------------------
>
>                 Key: BEAM-778
>                 URL: https://issues.apache.org/jira/browse/BEAM-778
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Tibor Kiss
>             Fix For: Not applicable
>
>
> We have a TODO to make fileio._CompressedFile seekable.
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L692
> Without this, compressed file objects produce for FileBasedSource implementations may
not be able to use libraries that utilize methods seek() and tell().
> For example tarfile.open().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message