beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)
Date Sat, 07 Jan 2017 02:21:58 GMT


ASF GitHub Bot commented on BEAM-1233:

GitHub user yk5 opened a pull request:

    [BEAM-1233] Create TFRecordIO, providing source/sink for TFRecords, 

    which is the dedicated record format for Tensorflow.
    For more about TFRecords, refer to
    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](

You can merge this pull request into a Git repository by running:

    $ git pull tfrecord

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1749
commit 3bbd2c1c208860c48c7a4c1909e3936a1fab4faa
Author: Younghee Kwon <>
Date:   2017-01-07T02:05:56Z

    Create TFRecordIO, which provides source/sink for TFRecords, the dedicated record format
for Tensorflow.
    For more about TFRecords, refer to


> Implement TFRecordIO (Reading/writing Tensorflow Standard format)
> -----------------------------------------------------------------
>                 Key: BEAM-1233
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py
>            Reporter: Younghee Kwon
>            Assignee: Ahmet Altay
> Tensorflow is an open source Machine Learning project, which is getting lots of attention
these days. Apache Beam can be used as a good preprocessing tool for this Machine Learning
tool, however Tensorflow supports limited number of input file formats -- only csv and its
own record format (so called TFRecord).
> On the other hand, Apache Beam doesn't support reading/writing in TFRecord format. This
would be useful once it supports TFRecordIO natively.

This message was sent by Atlassian JIRA

View raw message