beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-1637) Create Elasticsearch IO compatible with ES 5.x
Date Tue, 08 Aug 2017 10:08:00 GMT


ASF GitHub Bot commented on BEAM-1637:

GitHub user echauchot opened a pull request:

    [BEAM-1637] Create Elasticsearch IO compatible with ES 5.x

    Follow this checklist to help us incorporate your contribution quickly and easily:
     - [X] Make sure there is a [JIRA issue](
filed for the change (usually before you start working on it).  Trivial changes like typos
do not require a JIRA issue.  Your pull request should address just this issue, without pulling
in other changes.
     - [X] Each commit in the pull request should have a meaningful subject line and body.
     - [X] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`,
where you replace `BEAM-XXX` with the appropriate JIRA issue.
     - [X] Write a pull request description that is detailed enough to understand what the
pull request does, how, and why.
     - [X] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will
be performed on your pull request automatically.
     - [X] If this contribution is large, please file an Apache [Individual Contributor License
    R: @jkff 
    CC: @jbonofre 
    Some comments about this pull request:
    1. As discussed in the ML, the architecture is with a common module and modules per version
(which differ in features but also in UTests). Modules per version use the same package name
for backward compatibility (exactly same pipeline code). Classes in common package shall not
be used directly by users. In a previous design, the common module had the same java package
name than version modules to allow putting common classes package private. But I abandoned
this design because of javadoc generation problems (no public classes in common module and
no package exclusion possible otherwise no ES javadoc at all). So in the end, I just put common
classes in a common package with public visibility and a javadoc warning stating that they
shall not be used by pipeline authors. If you have a better suggestion, I'm all ear :)
    2. I could not use inheritance because of statics so I used composition. If you have a
better design, feel free to comment.
    3. There is a very hacky thing in the JarHell class. The problem was that surefire dependencies
entailed a duplicate class in the classpath which caused the jarHell detection to fail the
build. Please read the javadoc of this class. If you have any other suggestion to avoid jarHell
problem, feel free to comment.

You can merge this pull request into a Git repository by running:

    $ git pull BEAM-1637-ELASTICSEARCH-5

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3703
commit e623e8a529ab687b267685ef040e6059f61caa08
Author: Etienne Chauchot <>
Date:   2017-06-26T08:58:21Z

    [BEAM-1637] Create Elasticsearch IO compatible with ES 5.x


> Create Elasticsearch IO compatible with ES 5.x
> ----------------------------------------------
>                 Key: BEAM-1637
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Minor
> The current Elasticsearch IO (see is
only compatible with Elasticsearch v 2.x. The aim is to have an IO compatible with ES v 5.x.
Beyond being able to address v5.x elasticsearch instances, we could also leverage the use
of the Elasticsearch pipeline API and also better split the dataset (be as close as possible
of desiredBundleSize) thanks to the new ES split API that allows ES shards splitting.

This message was sent by Atlassian JIRA

View raw message