beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types
Date Sat, 02 Dec 2017 08:35:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275479#comment-16275479
] 

ASF GitHub Bot commented on BEAM-3157:
--------------------------------------

akedin opened a new pull request #4204: [BEAM-3157] Generate BeamRecord types from Pojos
URL: https://github.com/apache/beam/pull/4204
 
 
   This implements automatic generation of BeamRecordTypes and BeamRecordSqlTypes from pojo
types. Work is being done as part of [BEAM-3157](https://issues.apache.org/jira/browse/BEAM-3157).
   
   Main piece is [RecordFactory](https://github.com/apache/beam/compare/master...akedin:generate-record-types?expand=1#diff-55e6442c81f404c1004a445b550f03c9)
which exposes a method to generate BeamRecords from pojos. See [RecordFactoryTest](https://github.com/apache/beam/compare/master...akedin:generate-record-types?expand=1#diff-869b654afa6699d55098a8fc3f2e5740)
for usage examples. 
   
   The plan is to integrate this into the Beam SQL framework. Integration into SQL will be
done in the future PRs.
   
   Records generation is a major step to simplify conversion of pojo model to BeamRecords.
Immediate use case is implementation of Nexmark queries in Beam SQL using existing pojo models.

   This can also be used as a starting point for code generation for schema-aware collections.
   
   
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/)
filed for the change (usually before you start working on it).  Trivial changes like typos
do not require a JIRA issue.  Your pull request should address just this issue, without pulling
in other changes.
    - [ ] Each commit in the pull request should have a meaningful subject line and body.
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`,
where you replace `BEAM-XXX` with the appropriate JIRA issue.
    - [ ] Write a pull request description that is detailed enough to understand what the
pull request does, how, and why.
    - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will
be performed on your pull request automatically.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> BeamSql transform should support other PCollection types
> --------------------------------------------------------
>
>                 Key: BEAM-3157
>                 URL: https://issues.apache.org/jira/browse/BEAM-3157
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Ismaël Mejía
>            Assignee: Anton Kedin
>
> Currently the Beam SQL transform only supports input and output data represented as a
BeamRecord. This seems to me like an usability limitation (even if we can do a ParDo to prepare
objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map name/type/value from
an object field into Calcite so it is convenient to have a specific data type (BeamRecord)
for this. However we can accomplish the same by using a PCollection of JavaBean (where we
know the same information via the field names/types/values) or by using Avro records where
we also have the Schema information. For the output PCollection we can map the object via
a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not support composite
types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection<MyPojo> col = BeamSql.query("SELECT * FROM .... WHERE ...").from(MyPojo.class);
> A projection:
> PCollection<MyNewPojo> newCol = BeamSql.query("SELECT id, name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns however I suppose
that for memory use reasons maybe mapping directly into Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message