beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3171) convert a join into lookup
Date Sat, 11 Nov 2017 03:14:01 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248295#comment-16248295
] 

James Xu commented on BEAM-3171:
--------------------------------

I like the ability to external joining a dimension table. 

After read the external join link you referenced, seems [~kenn] are objecting to implement
it as a new PTransform in the Beam model, instead we should optimize the SideInput to let
the beam program can get values by keys, and let the runner decide to load all data in or
just query the remote KV store every time, I like this solution, it's cleaner.

But before we have the optimized SideInput, there's no harm to have an improved external join
first.

> convert a join into lookup
> --------------------------
>
>                 Key: BEAM-3171
>                 URL: https://issues.apache.org/jira/browse/BEAM-3171
>             Project: Beam
>          Issue Type: New Feature
>          Components: dsl-sql
>            Reporter: Xu Mingmin
>            Assignee: Xu Mingmin
>              Labels: experimental
>
> We use BeamSQL to run streaming jobs mostly, and  add a join_as_lookup improvement(internal
branch) to cover the streaming-to-batch case(similar as [1]). I could submit a PR as experimental
if people are interested. 
> The rough solution is, if one source of join node implements {{BeamSeekableTable}} and
the other is not, then the join node is converted to a fact-lookup operation.
> Ref:
> [1] https://docs.google.com/document/d/1B-XnUwXh64lbswRieckU0BxtygSV58hysqZbpZmk03A/edit?usp=sharing

> [~xumingming] [~takidau] for any comments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message