spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From advaitraut <...@git.apache.org>
Subject [GitHub] spark pull request #15753: Dev advait
Date Thu, 03 Nov 2016 13:36:59 GMT
GitHub user advaitraut opened a pull request:

    https://github.com/apache/spark/pull/15753

    Dev advait

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual
tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/advaitraut/spark dev-advait

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15753
    
----
commit 8950482ee5e9132d11dc5b5d41132bb1fe1e7ba2
Author: felixcheung <felixcheung_m@hotmail.com>
Date:   2016-01-05T03:09:58Z

    [SPARKR][DOC] minor doc update for version in migration guide
    
    checked that the change is in Spark 1.6.0.
    shivaram
    
    Author: felixcheung <felixcheung_m@hotmail.com>
    
    Closes #10574 from felixcheung/rwritemodedoc.
    
    (cherry picked from commit 8896ec9f02a6747917f3ae42a517ff0e3742eaf6)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

commit d9e4438b5c7b3569662a50973164955332463d05
Author: Michael Armbrust <michael@databricks.com>
Date:   2016-01-05T07:23:41Z

    [SPARK-12568][SQL] Add BINARY to Encoders
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #10516 from marmbrus/datasetCleanup.
    
    (cherry picked from commit 53beddc5bf04a35ab73de99158919c2fdd5d4508)
    Signed-off-by: Michael Armbrust <michael@databricks.com>

commit 5afa62b20090e763ba10d9939ec214a11466087b
Author: Pete Robbins <robbinspg@gmail.com>
Date:   2016-01-05T21:10:21Z

    [SPARK-12647][SQL] Fix o.a.s.sqlexecution.ExchangeCoordinatorSuite.determining the number
of reducers: aggregate operator
    
    change expected partition sizes
    
    Author: Pete Robbins <robbinspg@gmail.com>
    
    Closes #10599 from robbinspg/branch-1.6.

commit f31d0fd9ea12bfe94434671fbcfe3d0e06a4a97d
Author: Shixiong Zhu <shixiong@databricks.com>
Date:   2016-01-05T21:10:46Z

    [SPARK-12617] [PYSPARK] Clean up the leak sockets of Py4J
    
    This patch added Py4jCallbackConnectionCleaner to clean the leak sockets of Py4J every
30 seconds. This is a workaround before Py4J fixes the leak issue https://github.com/bartdag/py4j/issues/187
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10579 from zsxwing/SPARK-12617.
    
    (cherry picked from commit 047a31bb1042867b20132b347b1e08feab4562eb)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>

commit 83fe5cf9a2621d7e53b5792a7c7549c9da7f130a
Author: Shixiong Zhu <shixiong@databricks.com>
Date:   2016-01-05T21:48:47Z

    [SPARK-12511] [PYSPARK] [STREAMING] Make sure PythonDStream.registerSerializer is called
only once
    
    There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. (https://github.com/bartdag/py4j/pull/184)
    
    Py4j will create a PythonProxyHandler in Java for "transformer_serializer" when calling
"registerSerializer". If we call "registerSerializer" twice, the second PythonProxyHandler
will override the first one, then the first one will be GCed and trigger "PythonProxyHandler.finalize".
To avoid that, we should not call"registerSerializer" more than once, so that "PythonProxyHandler"
in Java side won't be GCed.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10514 from zsxwing/SPARK-12511.
    
    (cherry picked from commit 6cfe341ee89baa952929e91d33b9ecbca73a3ea0)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>

commit 0afad6678431846a6eebda8d5891da9115884915
Author: RJ Nowling <rnowling@gmail.com>
Date:   2016-01-05T23:05:04Z

    [SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans
    
    SPARK-12450 . Un-persist broadcasted variables in KMeans.
    
    Author: RJ Nowling <rnowling@gmail.com>
    
    Closes #10415 from rnowling/spark-12450.
    
    (cherry picked from commit 78015a8b7cc316343e302eeed6fe30af9f2961e8)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

commit bf3dca2df4dd3be264691be1321e0c700d4f4e32
Author: BrianLondon <brian@seatgeek.com>
Date:   2016-01-05T23:15:07Z

    [SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk
    
    Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and
1.6 branches.  For reasons I don't entirely understand it required a manual merge to 1.5 which
I did as shown here: https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2
    
    The demo ran successfully on the 1.5 branch as well.
    
    According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk
(1.9.37), but this appears to have fixed the kinesis regression in 1.5.2.
    
    Author: BrianLondon <brian@seatgeek.com>
    
    Closes #10492 from BrianLondon/remove-only.
    
    (cherry picked from commit ff89975543b153d0d235c0cac615d45b34aa8fe7)
    Signed-off-by: Sean Owen <sowen@cloudera.com>

commit c3135d02176cdd679b4a0e4883895b9e9f001a55
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   2016-01-06T06:35:41Z

    [SPARK-12393][SPARKR] Add read.text and write.text for SparkR
    
    Add ```read.text``` and ```write.text``` for SparkR.
    cc sun-rui felixcheung shivaram
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10348 from yanboliang/spark-12393.
    
    (cherry picked from commit d1fea41363c175a67b97cb7b3fe89f9043708739)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

commit 175681914af953b7ce1b2971fef83a2445de1f94
Author: zero323 <matthew.szymkiewicz@gmail.com>
Date:   2016-01-06T19:58:33Z

    [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
    
    If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`.
It can be fixed by converting `initialModel.weights` to `list`.
    
    Author: zero323 <matthew.szymkiewicz@gmail.com>
    
    Closes #9986 from zero323/SPARK-12006.
    
    (cherry picked from commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

commit d821fae0ecca6393d3632977797d72ba594d26a9
Author: Shixiong Zhu <shixiong@databricks.com>
Date:   2016-01-06T20:03:01Z

    [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming
    
    Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only
in StreamingContext.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10621 from zsxwing/SPARK-12617-2.
    
    (cherry picked from commit 1e6648d62fb82b708ea54c51cd23bfe4f542856e)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386
Author: huangzhaowei <carlmartinmax@gmail.com>
Date:   2016-01-06T20:48:57Z

    [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain
the streaming batch url.
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes #10617 from SaintBacchus/SPARK-12672.

commit 39b0a348008b6ab532768b90fd578b77711af98c
Author: Shixiong Zhu <shixiong@databricks.com>
Date:   2016-01-06T21:53:25Z

    Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path
to gain the streaming batch url."
    
    This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge #10618 instead.

commit 11b901b22b1cdaa6d19b1b73885627ac601be275
Author: Liang-Chi Hsieh <viirya@appier.com>
Date:   2015-12-14T17:59:42Z

    [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pyspark
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12016
    
    We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper
when loading it in pyspark.
    
    Author: Liang-Chi Hsieh <viirya@appier.com>
    
    Closes #10100 from viirya/fix-load-py-wordvecmodel.
    
    (cherry picked from commit b51a4cdff3a7e640a8a66f7a9c17021f3056fd34)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

commit 94af69c9be70b9d2cd95c26288e2af9599d61e5c
Author: jerryshao <sshao@hortonworks.com>
Date:   2016-01-07T05:28:29Z

    [SPARK-12673][UI] Add missing uri prepending for job description
    
    Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the
screenshot:
    
    ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png)
    
    Author: jerryshao <sshao@hortonworks.com>
    
    Closes #10618 from jerryshao/SPARK-12673.
    
    (cherry picked from commit 174e72ceca41a6ac17ad05d50832ee9c561918c0)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

commit d061b852274c12784f3feb96c0cdcab39989f8e7
Author: Guillaume Poulin <poulin.guillaume@gmail.com>
Date:   2016-01-07T05:34:46Z

    [SPARK-12678][CORE] MapPartitionsRDD clearDependencies
    
    MapPartitionsRDD was keeping a reference to `prev` after a call to
    `clearDependencies` which could lead to memory leak.
    
    Author: Guillaume Poulin <poulin.guillaume@gmail.com>
    
    Closes #10623 from gpoulin/map_partition_deps.
    
    (cherry picked from commit b6738520374637347ab5ae6c801730cdb6b35daa)
    Signed-off-by: Reynold Xin <rxin@databricks.com>

commit 34effc46cd54735cc660d8b43f0a190e91747a06
Author: Yin Huai <yhuai@databricks.com>
Date:   2016-01-07T06:03:31Z

    Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None"
    
    This reverts commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #10632 from yhuai/pythonStyle.
    
    (cherry picked from commit e5cde7ab11a43334fa01b1bb8904da5c0774bc62)
    Signed-off-by: Yin Huai <yhuai@databricks.com>

commit 47a58c799206d011587e03178a259974be47d3bc
Author: zzcclp <xm_zzc@sina.com>
Date:   2016-01-07T07:06:21Z

    [DOC] fix 'spark.memory.offHeap.enabled' default value to false
    
    modify 'spark.memory.offHeap.enabled' default value to false
    
    Author: zzcclp <xm_zzc@sina.com>
    
    Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.
    
    (cherry picked from commit 84e77a15df18ba3f1cc871a3c52c783b46e52369)
    Signed-off-by: Reynold Xin <rxin@databricks.com>

commit 69a885a71cfe7c62179e784e7d9eee023d3bb6eb
Author: zero323 <matthew.szymkiewicz@gmail.com>
Date:   2016-01-07T18:32:56Z

    [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
    
    If initial model passed to GMM is not empty it causes net.razorvine.pickle.PickleException.
It can be fixed by converting initialModel.weights to list.
    
    Author: zero323 <matthew.szymkiewicz@gmail.com>
    
    Closes #10644 from zero323/SPARK-12006.
    
    (cherry picked from commit 592f64985d0d58b4f6a0366bf975e04ca496bdbe)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

commit 017b73e69693cd151516f92640a95a4a66e02dff
Author: Sameer Agarwal <sameer@databricks.com>
Date:   2016-01-07T18:37:15Z

    [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping splits
    
    https://issues.apache.org/jira/browse/SPARK-12662
    
    cc yhuai
    
    Author: Sameer Agarwal <sameer@databricks.com>
    
    Closes #10626 from sameeragarwal/randomsplit.
    
    (cherry picked from commit f194d9911a93fc3a78be820096d4836f22d09976)
    Signed-off-by: Reynold Xin <rxin@databricks.com>

commit 6ef823544dfbc8c9843bdedccfda06147a1a74fe
Author: Darek Blasiak <darek.blasiak@640labs.com>
Date:   2016-01-07T21:15:40Z

    [SPARK-12598][CORE] bug in setMinPartitions
    
    There is a bug in the calculation of ```maxSplitSize```.  The ```totalLen``` should be
divided by ```minPartitions``` and not by ```files.size```.
    
    Author: Darek Blasiak <darek.blasiak@640labs.com>
    
    Closes #10546 from datafarmer/setminpartitionsbug.
    
    (cherry picked from commit 8346518357f4a3565ae41e9a5ccd7e2c3ed6c468)
    Signed-off-by: Sean Owen <sowen@cloudera.com>

commit a7c36362fb9532183b7b6a0ad5020f02b816a9b3
Author: Shixiong Zhu <shixiong@databricks.com>
Date:   2016-01-08T01:37:46Z

    [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations
for Streaming
    
    /cc tdas brkyvz
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10453 from zsxwing/streaming-conf.
    
    (cherry picked from commit c94199e977279d9b4658297e8108b46bdf30157b)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

commit 0d96c54534d8bfca191c892b98397a176bc46152
Author: Shixiong Zhu <shixiong@databricks.com>
Date:   2016-01-08T10:02:06Z

    [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 1.6)
    
    backport #10609 to branch 1.6
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10656 from zsxwing/SPARK-12591-branch-1.6.

commit fe2cf342e2eddd7414bacf9f5702042a20c6d50f
Author: Jeff Zhang <zjffdu@apache.org>
Date:   2016-01-08T19:38:46Z

    [DOCUMENTATION] doc fix of job scheduling
    
    spark.shuffle.service.enabled is spark application related configuration, it is not necessary
to set it in yarn-site.xml
    
    Author: Jeff Zhang <zjffdu@apache.org>
    
    Closes #10657 from zjffdu/doc-fix.
    
    (cherry picked from commit 00d9261724feb48d358679efbae6889833e893e0)
    Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

commit e4227cb3e19afafe3a7b5a2847478681db2f2044
Author: Udo Klein <git@blinkenlight.net>
Date:   2016-01-08T20:32:37Z

    fixed numVertices in transitive closure example
    
    Author: Udo Klein <git@blinkenlight.net>
    
    Closes #10642 from udoklein/patch-2.
    
    (cherry picked from commit 8c70cb4c62a353bea99f37965dfc829c4accc391)
    Signed-off-by: Sean Owen <sowen@cloudera.com>

commit faf094c7c35baf0e73290596d4ca66b7d083ed5b
Author: Thomas Graves <tgraves@apache.org>
Date:   2016-01-08T20:38:19Z

    [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…
    
    …s on secure Hadoop
    
    https://issues.apache.org/jira/browse/SPARK-12654
    
    So the bug here is that WholeTextFileRDD.getPartitions has:
    val conf = getConf
    in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that
to create a new newJobContext.
    The newJobContext will copy credentials around, but credentials are only present in a
JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration
its changing it from a JobConf to Configuration and dropping the credentials that were there.
NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why
it works.
    
    Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
    
    Closes #10651 from tgravescs/SPARK-12654.
    
    (cherry picked from commit 553fd7b912a32476b481fd3f80c1d0664b6c6484)
    Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>

commit a6190508b20673952303eff32b3a559f0a264d03
Author: Michael Armbrust <michael@databricks.com>
Date:   2016-01-08T23:43:11Z

    [SPARK-12696] Backport Dataset Bug fixes to 1.6
    
    We've fixed a lot of bugs in master, and since this is experimental in 1.6 we should consider
back porting the fixes.  The only thing that is obviously risky to me is 0e07ed3, we might
try to remove that.
    
    Author: Wenchen Fan <wenchen@databricks.com>
    Author: gatorsmile <gatorsmile@gmail.com>
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    Author: Cheng Lian <lian@databricks.com>
    Author: Nong Li <nong@databricks.com>
    
    Closes #10650 from marmbrus/dataset-backports.

commit 8b5f23043322254c725c703c618ba3d3cc4a4240
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   2016-01-09T06:59:51Z

    [SPARK-12645][SPARKR] SparkR support hash function
    
    Add ```hash``` function for SparkR ```DataFrame```.
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10597 from yanboliang/spark-12645.
    
    (cherry picked from commit 3d77cffec093bed4d330969f1a996f3358b9a772)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

commit 7903b0610283a91c47f5df1aab069cf8930b4f27
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-01-10T22:49:45Z

    [SPARK-10359][PROJECT-INFRA] Backport dev/test-dependencies script to branch-1.6
    
    This patch backports the `dev/test-dependencies` script (from #10461) to branch-1.6.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10680 from JoshRosen/test-deps-16-backport.

commit 43b72d83e1d0c426d00d29e54ab7d14579700330
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-01-11T08:36:52Z

    [SPARK-12734][BUILD] Backport Netty exclusion + Maven enforcer fixes to branch-1.6
    
    This patch backports the Netty exclusion fixes from #10672 to branch-1.6.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10691 from JoshRosen/netty-exclude-16-backport.

commit d4cfd2acd62f2b0638a12bbbb48a38263c04eaf8
Author: Udo Klein <git@blinkenlight.net>
Date:   2016-01-11T09:30:08Z

    removed lambda from sortByKey()
    
    According to the documentation the sortByKey method does not take a lambda as an argument,
thus the example is flawed. Removed the argument completely as this will default to ascending
sort.
    
    Author: Udo Klein <git@blinkenlight.net>
    
    Closes #10640 from udoklein/patch-1.
    
    (cherry picked from commit bd723bd53d9a28239b60939a248a4ea13340aad8)
    Signed-off-by: Sean Owen <sowen@cloudera.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message