beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mizitch <...@git.apache.org>
Subject [GitHub] incubator-beam pull request #1327: [BEAM-840] Some minor changes and fixes f...
Date Wed, 09 Nov 2016 22:27:23 GMT
GitHub user mizitch opened a pull request:

    https://github.com/apache/incubator-beam/pull/1327

    [BEAM-840] Some minor changes and fixes for sorter module. 

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [x] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [x] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [x] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---
    Includes:
    * Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB to prevent
int overflow within Hadoop's sorting library
    * Fix int overflow for large memory values in InMemorySorter
    * Add note about estimated disk use to README.MD
    * Fix to make Hadoop's sorting library put all temp files under the specified directory
    * Have Hadoop clean up the temp directory on exit
    * Stop shading hadoop dependencies. Some context:
    ** The existing shading is broken (modules that depend on this one cannot use it successfully).
    ** Hadoop's use of reflection in several instances makes shading the dependency "in a
good way" nearly impossible. It requires a couple of rather brittle hacks, and, for clients
that depend on certain conflicting versions of hadoop these hacks can mean it doesn't meet
its intended goal of preventing conflicts anyway.
    ** From what I can tell, there's no good way to shade this to make it universally usable,
so leaving it unshaded seems like a reasonable default.
    ** Without shading Hadoop, this module can be successfully used from Beam's wordcount
example (which actually does have pre-existing hadoop dependencies already).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mizitch/incubator-beam sorter-gcs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/1327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1327
    
----
commit d07c4ce9349abac4d0c53223072f1c84a1dc98c6
Author: Mitch Shanklin <mshanklin@google.com>
Date:   2016-11-09T22:09:49Z

    Some minor changes and fixes for sorter module. Includes:
    
    * Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB to prevent
int overflow within Hadoop's sorting library
    * Fix int overflow for large memory values in InMemorySorter
    * Add note about estimated disk use to README.MD
    * Fix to make Hadoop's sorting library put all temp files under the specified directory
    * Have Hadoop clean up the temp directory on exit
    * Stop shading hadoop dependencies. Some context:
    ** The existing shading is broken (modules that depend on this one cannot use it successfully).
    ** Hadoop's use of reflection in several instances makes shading the dependency "in a
good way" nearly impossible. It requires a couple of rather brittle hacks, and, for clients
that depend on certain conflicting versions of hadoop these hacks can mean it doesn't meet
its intended goal of preventing conflicts anyway.
    ** From what I can tell, there's no good way to shade this to make it universally usable,
so leaving it unshaded seems like a reasonable default.
    ** Without shading Hadoop, this module can be successfully used from Beam's wordcount
example (which actually does have pre-existing hadoop dependencies already).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message