spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From taylor-brown <...@git.apache.org>
Subject [GitHub] spark pull request: nly use subset of features at each split point...
Date Tue, 22 Jul 2014 18:39:24 GMT
GitHub user taylor-brown opened a pull request:

    https://github.com/apache/spark/pull/1534

    nly use subset of features at each split point in decision tree

    Tree modified to work in random forest (random subspace addition)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ccri/spark random_subspace

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1534
    
----
commit a16a19fbd382e1d39cdf403246ad215666f1f402
Author: Michael Armbrust <michael@databricks.com>
Date:   2014-05-17T03:25:10Z

    SPARK-1864 Look in spark conf instead of system properties when propagating configuration
to executors.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #808 from marmbrus/confClasspath and squashes the following commits:
    
    4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating
configuration to executors.
    (cherry picked from commit a80a6a139e729ee3f81ec4f0028e084d2d9f7e82)
    
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>

commit 9cd12f33df6e56d34ff3019c714bddfe298fe5c7
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-17T04:42:14Z

    Version bump of spark-ec2 scripts
    
    This will allow us to change things in spark-ec2 related to the 1.0 release.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #809 from pwendell/spark-ec2 and squashes the following commits:
    
    59117fb [Patrick Wendell] Version bump of spark-ec2 scripts
    (cherry picked from commit c0ab85d7320cea90e6331fb03a70349bc804c1b1)
    
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>

commit 318739a0794c9d2994901a5d3b16c4c133d293c6
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-17T05:34:38Z

    [SPARK-1808] Route bin/pyspark through Spark submit
    
    **Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration
properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism
is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified
in `conf/spark-defaults.conf`.
    
    **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts
in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit
of making the invocation of all the user facing Spark scripts consistent.
    
    **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications
and (2) running the python shell. For (1), Spark submit already handles running python applications.
For cases in which `bin/pyspark` is given a python file, we can simply call pass the file
directly to Spark submit and let it handle the rest.
    
    For case (2), `bin/pyspark` starts a python process as before, which launches the JVM
as a sub-process. The existing code already provides a code path to do this. All we needed
to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires
modifications to Spark submit to handle the pyspark shell as a special case.
    
    This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN
cluster. Running IPython also works as before, except now it takes in Spark submit arguments
too.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #799 from andrewor14/pyspark-submit and squashes the following commits:
    
    bf37e36 [Andrew Or] Minor changes
    01066fa [Andrew Or] bin/pyspark for Windows
    c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
    1866f85 [Andrew Or] Windows is not cooperating
    456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
    7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
    b7ba0d8 [Andrew Or] Address a few comments (minor)
    06eb138 [Andrew Or] Use shlex instead of writing our own parser
    05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
    a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
    6fba412 [Andrew Or] Deal with quotes + address various comments
    fe4c8a7 [Andrew Or] Update --help for bin/pyspark
    afe47bf [Andrew Or] Fix spark shell
    f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
    a371d26 [Andrew Or] Route bin/pyspark through Spark submit
    (cherry picked from commit 4b8ec6fcfd7a7ef0857d5b21917183c181301c95)
    
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>

commit 03b4242630600f010bf9ddada0e6008ba9141d6b
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-17T05:36:23Z

    [SPARK-1824] Remove <master> from Python examples
    
    A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python
too.
    
    Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With
only the changes in this PR, the only way to run these examples is through Spark submit. Once
#799 goes in, you can use `bin/pyspark` to run them too. For example,
    
    ```
    bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512]
    ```
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #802 from andrewor14/python-examples and squashes the following commits:
    
    cf50b9f [Andrew Or] De-indent python comments (minor)
    50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction
    c362f69 [Andrew Or] Update docs to use spark-submit for python applications
    7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples
    427a5f0 [Andrew Or] Update docs
    d32072c [Andrew Or] Remove <master> from examples + update usages
    (cherry picked from commit cf6cbe9f76c3b322a968c836d039fc5b70d4ce43)
    
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>

commit 3b3d7c8ec4d2ddf632d9fd46a45c87586a8db174
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-17T05:58:47Z

    Make deprecation warning less severe
    
    Just a small change. I think it's good not to scare people who are using the old options.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #810 from pwendell/warnings and squashes the following commits:
    
    cb8a311 [Patrick Wendell] Make deprecation warning less severe
    (cherry picked from commit 442808a7482b81c8de887c901b424683da62022e)
    
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>

commit e98bc194bd694e81d7403d011bcbe2b623cb30e4
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-17T06:10:46Z

    Revert "[maven-release-plugin] prepare for next development iteration"
    
    This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.

commit 80889110aad54866f113b18f206694148f715a05
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-17T06:10:53Z

    Revert "[maven-release-plugin] prepare release v1.0.0-rc8"
    
    This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.

commit 920f947eb5a22a679c0c3186cf69ee75f6041c75
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-17T06:37:50Z

    [maven-release-plugin] prepare release v1.0.0-rc9

commit f8e611955096c5c1c7db5764b9d2851b1d295f0d
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-17T06:37:58Z

    [maven-release-plugin] prepare for next development iteration

commit e06e4b0affc00bc15498313a36edbc9b7e2aaae2
Author: Neville Li <neville@spotify.com>
Date:   2014-05-18T20:31:23Z

    Fix spark-submit path in spark-shell & pyspark
    
    Author: Neville Li <neville@spotify.com>
    
    Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits:
    
    0dc33ed [Neville Li] Fix spark-submit path in pyspark
    becec64 [Neville Li] Fix spark-submit path in spark-shell

commit 8e8b351cfd7dddac8dc0a3faf8639f11398f8807
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-18T23:51:53Z

    SPARK-1873: Add README.md file when making distributions
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #818 from pwendell/reamde and squashes the following commits:
    
    4020b11 [Patrick Wendell] SPARK-1873: Add README.md file when making distributions
    
    (cherry picked from commit 4ce479324bdcf603806fc90b5b0f4968c6de690e)
    Signed-off-by: Matei Zaharia <matei@databricks.com>

commit ecab8a239dcbb889181c572317581d1c8b627201
Author: Xiangrui Meng <meng@databricks.com>
Date:   2014-05-19T00:00:57Z

    [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0
    
    Some improvements to MLlib guide:
    
    1. [SPARK-1872] Update API links for unidoc.
    2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use
it instead of `page.title` for title display.
    3. Add more Java/Python examples.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #816 from mengxr/mllib-doc and squashes the following commits:
    
    ec2e407 [Xiangrui Meng] format scala example for ALS
    cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
    4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
    d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
    561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
    195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
    9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
    7dad18e [Xiangrui Meng] update java api links in NB
    3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
    35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
    e4afaa8 [Xiangrui Meng] explicity state what might change
    
    (cherry picked from commit df0aa8353ab6d3b19d838c6fa95a93a64948309f)
    Signed-off-by: Matei Zaharia <matei@databricks.com>

commit 111c121ae97730fa8d87db7f0d17e10879fa76ab
Author: Matei Zaharia <matei@databricks.com>
Date:   2014-05-19T22:02:35Z

    [SPARK-1876] Windows fixes to deal with latest distribution layout changes
    
    - Look for JARs in the right place
    - Launch examples the same way as on Unix
    - Load datanucleus JARs if they exist
    - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are
not valid URIs
    - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas
SBT was)
    
    Author: Matei Zaharia <matei@databricks.com>
    
    Closes #819 from mateiz/win-fixes and squashes the following commits:
    
    d558f96 [Matei Zaharia] Fix comment
    228577b [Matei Zaharia] Review comments
    d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
    144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
    
    (cherry picked from commit 7b70a7071894dd90ea1d0091542b3e13e7ef8d3a)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 901102c1ba5f800705819916f2b7a38b6750cffb
Author: zsxwing <zsxwing@gmail.com>
Date:   2014-05-19T23:41:31Z

    SPARK-1878: Fix the incorrect initialization order
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-1878
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #822 from zsxwing/SPARK-1878 and squashes the following commits:
    
    4a47e27 [zsxwing] SPARK-1878: Fix the incorrect initialization order
    
    (cherry picked from commit 1811ba8ccb580979aa2e12019e6a82805f09ab53)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 00563e15e3281e0efa67c0ed3c62f77c5ee66f94
Author: Matei Zaharia <matei@databricks.com>
Date:   2014-05-20T01:42:28Z

    SPARK-1879. Increase MaxPermSize since some of our builds have many classes
    
    See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran
out of PermGen space in spark-shell, when those things added up with the Scala compiler.
    
    Note that users can still override it by setting their own Java options with this change.
Their options will come later in the command string than the -XX:MaxPermSize=128m.
    
    Author: Matei Zaharia <matei@databricks.com>
    
    Closes #823 from mateiz/spark-1879 and squashes the following commits:
    
    6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots
of classes
    
    (cherry picked from commit 5af99d7617ba3b9fbfdb345ef9571b7dd41f45a1)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 875c54fb3e90b2aa597366a98c83015c156948f2
Author: witgo <witgo@qq.com>
Date:   2014-05-20T02:40:29Z

    [SPARK-1875]NoClassDefFoundError: StringUtils when building with hadoop 1.x and hive
    
    Author: witgo <witgo@qq.com>
    
    Closes #824 from witgo/SPARK-1875_commons-lang-2.6 and squashes the following commits:
    
    ef7231d [witgo] review commit
    ead3c3b [witgo] SPARK-1875:NoClassDefFoundError: StringUtils when building against Hadoop
1
    
    (cherry picked from commit 6a2c5c610c259f62cb12d8cfc18bf59cdb334bb2)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 78b6e6f1e8ee6a27ef4eed93aac6eba716b5ffce
Author: Aaron Davidson <aaron@databricks.com>
Date:   2014-05-20T03:55:26Z

    SPARK-1689: Spark application should die when removed by Master
    
    scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic
event for Spark applications, and should probably be treated as such.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #832 from aarondav/i-love-u and squashes the following commits:
    
    9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master
    
    (cherry picked from commit b0ce22e071da4cc62ec5e29abf7b1299b8e4a6b0)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 1c6c8b5bd2bdecfc5fdabd33ee8762fe25b0e69a
Author: Xiangrui Meng <meng@databricks.com>
Date:   2014-05-20T04:29:33Z

    [SPARK-1874][MLLIB] Clean up MLlib sample data
    
    1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`.
    2. Embedded instructions in the help message of those example apps.
    
    Per discussion with Matei on the JIRA page, new example data is under `data/mllib`.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #833 from mengxr/mllib-sample-data and squashes the following commits:
    
    59f0a18 [Xiangrui Meng] add sample binary classification data
    3c2f92f [Xiangrui Meng] add linear regression data
    050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
    
    (cherry picked from commit bcb9dce6f444a977c714117811bce0c54b417650)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 6cbe2a37ccb14f65b6d6b813a585adbbc43684c4
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T05:36:24Z

    [Spark 1877] ClassNotFoundException when loading RDD with serialized objects
    
    Updated version of #821
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com>
    
    Closes #835 from tdas/SPARK-1877 and squashes the following commits:
    
    f346f71 [Tathagata Das] Addressed Patrick's comments.
    fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized
objects
    
    (cherry picked from commit 52eb54d02403a3c37d84b9da7cc1cdb261048cf8)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 1c00f2a251b8a39fb3f5f72b0f654cfc5ec66338
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T06:12:24Z

    Updated CHANGES.txt

commit 3f3e988cab4ac350f79ae3e2aadbfd0b5e6938e9
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T06:13:45Z

    Revert "[maven-release-plugin] prepare for next development iteration"
    
    This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.

commit 0d988421742bf43fbd13531fa7ede8d93e59a19b
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T06:15:20Z

    Revert "[maven-release-plugin] prepare release v1.0.0-rc9"
    
    This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.

commit b4d93d38d9da61721e64919f95447fafe87bf4d1
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T17:27:12Z

    [Hotfix] Blacklisted flaky HiveCompatibility test
    
    `lateral_view_outer` query sometimes returns a different set of 10 rows.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #838 from tdas/hive-test-fix2 and squashes the following commits:
    
    9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test.
    
    (cherry picked from commit 7f0cfe47f4709843d70ceccc25dee7551206ce0d)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit d807023479ce10aec28ef3c1ab646ddefc2e663c
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T18:03:34Z

    [maven-release-plugin] prepare release v1.0.0-rc10

commit 67dd53d2556f03ce292e6889128cf441f1aa48f8
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-20T18:03:42Z

    [maven-release-plugin] prepare for next development iteration

commit 364c14af9a339ac36b0fc54d1559e260e1550ab0
Author: Sumedh Mungee <smungee@gmail.com>
Date:   2014-05-21T08:22:25Z

    [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class
    
    Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments
make it seem like the script is looking for the Scala installation when in fact it is looking
for Spark.
    
    Author: Sumedh Mungee <smungee@gmail.com>
    
    Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:
    
    26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
    
    (cherry picked from commit 6e337380fc47071fc7fb28d744e8209c729fe1e9)
    Signed-off-by: Reynold Xin <rxin@apache.org>

commit 7295dd94b53487fce984da7f44d41ec3468bae88
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-21T08:23:34Z

    [Docs] Correct example of creating a new SparkConf
    
    The example code on the configuration page currently does not compile.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #842 from andrewor14/conf-docs and squashes the following commits:
    
    aabff57 [Andrew Or] Correct example of creating a new SparkConf
    
    (cherry picked from commit 1014668f2727863fe46f9c75201ee459d093bf0c)
    Signed-off-by: Reynold Xin <rxin@apache.org>

commit bc6bbfa65b110016c833689f5e84aab8ba5a575a
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-21T08:25:10Z

    [Minor] Move JdbcRDDSuite to the correct package
    
    It was in the wrong package
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #839 from andrewor14/jdbc-suite and squashes the following commits:
    
    f948c5a [Andrew Or] cache -> cache()
    b215279 [Andrew Or] Move JdbcRDDSuite to the correct package
    
    (cherry picked from commit 7c79ef7d43de258ad9a5de15c590132bd78ce8dd)
    Signed-off-by: Reynold Xin <rxin@apache.org>

commit 9b8f7725145728d2d3c97acd8b515484cb98d9c0
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-21T18:59:05Z

    [Typo] Stoped -> Stopped
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #847 from andrewor14/yarn-typo and squashes the following commits:
    
    c1906af [Andrew Or] Stoped -> Stopped
    
    (cherry picked from commit ba5d4a99425a2083fea2a9759050c5e770197e23)
    Signed-off-by: Reynold Xin <rxin@apache.org>

commit 30d1df5e00de36951c2bce619fc5a934184164b7
Author: Kan Zhang <kzhang@apache.org>
Date:   2014-05-21T20:26:53Z

    [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark
    
    Author: Kan Zhang <kzhang@apache.org>
    
    Closes #697 from kanzhang/SPARK-1519 and squashes the following commits:
    
    4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark
    
    (cherry picked from commit f18fd05b513b136363c94adb3e5b841f8bf48134)
    Signed-off-by: Reynold Xin <rxin@apache.org>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message