beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chamikara Jayalath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
Date Thu, 16 Feb 2017 18:01:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870400#comment-15870400
] 

Chamikara Jayalath commented on BEAM-1440:
------------------------------------------

Hi Ibrahim,

Great to hear that you are interested in working on this issue.

A good place to start is to read documentation on following.
(1) Python BoundedSource API
- Currently most of the documentation is available in the form of code comments within the
code. These comments are pretty detailed so that should give you a good idea. Read Documentation
available in iobase.BoundedSource and iobase.RangeTracker classes.
- We should have a Beam guide on developing sources soon.

(2) Google BigQuery
- Documentation: https://cloud.google.com/bigquery/docs/
- Also try the QuickStart using Web UI. 

BTW, quick question, are yo hoping to do this as a summer of code project ?

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> ------------------------------------------------------------------------------
>
>                 Key: BEAM-1440
>                 URL: https://issues.apache.org/jira/browse/BEAM-1440
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>              Labels: gsoc2017, mentor, python
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements iobase.BoundedSource [2]
interface so that other runners that try to use Python SDK can read from BigQuery as well.
Java SDK already has a Beam BigQuery source [3].
> [1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py
> [2] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message