Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 168B4200B4B for ; Thu, 7 Jul 2016 06:49:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1505C160A73; Thu, 7 Jul 2016 04:49:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 39BB9160A64 for ; Thu, 7 Jul 2016 06:49:15 +0200 (CEST) Received: (qmail 6192 invoked by uid 500); 7 Jul 2016 04:49:14 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 6183 invoked by uid 99); 7 Jul 2016 04:49:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2016 04:49:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DCB18C0403 for ; Thu, 7 Jul 2016 04:49:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.396 X-Spam-Level: X-Spam-Status: No, score=-4.396 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id brHEiPxv49OL for ; Thu, 7 Jul 2016 04:49:12 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id D15655F3A0 for ; Thu, 7 Jul 2016 04:49:11 +0000 (UTC) Received: (qmail 6165 invoked by uid 99); 7 Jul 2016 04:49:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2016 04:49:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E95322C0003 for ; Thu, 7 Jul 2016 04:49:10 +0000 (UTC) Date: Thu, 7 Jul 2016 04:49:10 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 07 Jul 2016 04:49:16 -0000 [ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365604#comment-15365604 ] ASF GitHub Bot commented on BEAM-360: ------------------------------------- GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/599 [BEAM-360] Some updates related to dynamic work rebalancing of custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #599 ---- commit e51d4acf12133a79671c567c9ff709c941c54f8c Author: Chamikara Jayalath Date: 2016-06-21T01:09:50Z Implements a framework for developing sources for new file types. Module 'filebasedsource' provides a framework for creating sources for new file types. This framework readily implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. commit cacb613448b47592f8415570f7b64bc6de797f91 Author: Chamikara Jayalath Date: 2016-07-07T03:25:04Z Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 264b4afc17c255e568a490e02ce47e9fb4b1e17a Author: Chamikara Jayalath Date: 2016-07-07T03:34:21Z Adds more comments. commit 49e097f9c5c3d8c2bca48d3416b4934a4d86ed34 Author: Chamikara Jayalath Date: 2016-07-07T04:41:06Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit c9696c9e17c9c7a6fc13d53d4da21ac9b325c73c Author: Chamikara Jayalath Date: 2016-07-07T04:41:20Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. ---- > Add a framework for creating Python-SDK sources for new file types > ------------------------------------------------------------------ > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py > Reporter: Chamikara Jayalath > Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates logic common to sources that are based on files. This framework can include following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)