beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Xu (JIRA)" <>
Subject [jira] [Commented] (BEAM-995) Apache Pig DSL
Date Wed, 27 Sep 2017 12:50:00 GMT


James Xu commented on BEAM-995:

[~nielsbasjes] Pig actually is doing the similar thing as Beam: 

# They both defined an unified data processing API
# They both support several backends.

So do pig-on-beam on either side does not have so much difference. I prefer to do it on beam-side

# BEAM is already doing the `support several backends` thing, let's just let BEAM do it, make
pig focus more on its primary advantage: the friendly API.
# To align with other extension like SQL.

For the pros to do it on pig-side you mentioned:

1. Builtin facilities for loading UDFs and UDAFs

> Yes, I agree, the existing UDFs and UDAFs are very important. If we do pig-on-beam on
beam-side, we will have something like `UDFAdapter` which will adapt all existing UDFs, so
we can use them in the new pig-on-beam.

2. Execution flow optimizer(s)

> There is pipeline optimizer in BEAM, and also an optimizer in underline engine(Spark,
MapReduce), will pig optimizer matter so much in this context? (I am not familiar with Pig,
correct me if I am wrong)

3. A selection of execution backends.

> Beam itself supports all the different backends.

> Apache Pig DSL
> --------------
>                 Key: BEAM-995
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
> Apache Pig is still popular and the language is not so large.
> Providing a DSL using the Pig language would potentially allow more people to use Beam
(at least during a transition period).

This message was sent by Atlassian JIRA

View raw message