spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kanzhang <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-1817] RDD.zip() should verify partition...
Date Mon, 02 Jun 2014 23:45:05 GMT
GitHub user kanzhang reopened a pull request:

    https://github.com/apache/spark/pull/760

    [SPARK-1817] RDD.zip() should verify partition sizes for each partition

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kanzhang/spark SPARK-1817

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #760
    
----
commit 16ffadcc4af21430b5079dc555bcd9d8cf1fa1fa
Author: William Benton <willb@redhat.com>
Date:   2014-05-13T20:45:23Z

    SPARK-571: forbid return statements in cleaned closures
    
    This patch checks top-level closure arguments to `ClosureCleaner.clean` for `return` statements
and raises an exception if it finds any.  This is mainly a user-friendliness addition, since
programs with return statements in closure arguments will currently fail upon RDD actions
with a less-than-intuitive error message.
    
    Author: William Benton <willb@redhat.com>
    
    Closes #717 from willb/spark-571 and squashes the following commits:
    
    c41eb7d [William Benton] Another test case for SPARK-571
    30c42f4 [William Benton] Stylistic cleanups
    559b16b [William Benton] Stylistic cleanups from review
    de13b79 [William Benton] Style fixes
    295b6a5 [William Benton] Forbid return statements in closure arguments.
    b017c47 [William Benton] Added a test for SPARK-571

commit d1e487473fd509f28daf28dcda856f3c2f1194ec
Author: Andrew Tulloch <andrew@tullo.ch>
Date:   2014-05-14T00:31:27Z

    SPARK-1791 - SVM implementation does not use threshold parameter
    
    Summary:
    https://issues.apache.org/jira/browse/SPARK-1791
    
    Simple fix, and backward compatible, since
    
    - anyone who set the threshold was getting completely wrong answers.
    - anyone who did not set the threshold had the default 0.0 value for the threshold anyway.
    
    Test Plan:
    Unit test added that is verified to fail under the old implementation,
    and pass under the new implementation.
    
    Reviewers:
    
    CC:
    
    Author: Andrew Tulloch <andrew@tullo.ch>
    
    Closes #725 from ajtulloch/SPARK-1791-SVM and squashes the following commits:
    
    770f55d [Andrew Tulloch] SPARK-1791 - SVM implementation does not use threshold parameter

commit 5c0dafc2c8734a421206a808b73be67b66264dd7
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-14T01:32:32Z

    [SPARK-1816] LiveListenerBus dies if a listener throws an exception
    
    The solution is to wrap a try / catch / log around the posting of each event to each listener.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #759 from andrewor14/listener-die and squashes the following commits:
    
    aee5107 [Andrew Or] Merge branch 'master' of github.com:apache/spark into listener-die
    370939f [Andrew Or] Remove two layers of indirection
    422d278 [Andrew Or] Explicitly throw an exception instead of 1 / 0
    0df0e2a [Andrew Or] Try/catch and log exceptions when posting events

commit 753b04dea4b04ba9d0dd0011f00e9d70367e76fc
Author: Ye Xianjin <advancedxy@gmail.com>
Date:   2014-05-14T02:03:51Z

    [SPARK-1527] change rootDir*.getName to rootDir*.getAbsolutePath
    
    JIRA issue: [SPARK-1527](https://issues.apache.org/jira/browse/SPARK-1527)
    
    getName() only gets the last component of the file path. When deleting test-generated
directories,
    we should pass the generated directory's absolute path to DiskBlockManager.
    
    Author: Ye Xianjin <advancedxy@gmail.com>
    
    This patch had conflicts when merged, resolved by
    Committer: Patrick Wendell <pwendell@gmail.com>
    
    Closes #436 from advancedxy/SPARK-1527 and squashes the following commits:
    
    4678bab [Ye Xianjin] change rootDir*.getname to rootDir*.getAbsolutePath so the temporary
directories are deleted when the test is finished.

commit 44233865cf8020741d862d33cc660c88e9315dea
Author: Michael Armbrust <michael@databricks.com>
Date:   2014-05-14T04:23:51Z

    [SQL] Make it possible to create Java/Python SQLContexts from an existing Scala SQLContext.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #761 from marmbrus/existingContext and squashes the following commits:
    
    4651051 [Michael Armbrust] Make it possible to create Java/Python SQLContexts from an
existing Scala SQLContext.

commit 92cebada09a7e5a00ab48bcb350a9462949c33eb
Author: Syed Hashmi <shashmi@cloudera.com>
Date:   2014-05-14T04:24:23Z

    [SPARK-1784] Add a new partitioner to allow specifying # of keys per partition
    
    This change adds a new partitioner which allows users
    to specify # of keys per partition.
    
    Author: Syed Hashmi <shashmi@cloudera.com>
    
    Closes #721 from syedhashmi/master and squashes the following commits:
    
    4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner

commit c33b8dcbf65a3a0c5ee5e65cd1dcdbc7da36aa5f
Author: larvaboy <larvaboy@gmail.com>
Date:   2014-05-14T04:26:08Z

    Implement ApproximateCountDistinct for SparkSql
    
    Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog
algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number
of distinct elements in each partitions, and 2) merge the HyperLogLog results from different
partitions.
    
    A simple serializer and test cases are added as well.
    
    Author: larvaboy <larvaboy@gmail.com>
    
    Closes #737 from larvaboy/master and squashes the following commits:
    
    bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct.
    9ba8360 [larvaboy] Fix alignment and null handling issues.
    95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct.
    f57917d [larvaboy] Add the parser for the approximate count.
    a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions.
    7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog.
    1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class.
    653542b [larvaboy] Fix a couple of minor typos.

commit 7bb9a521f35eb19576c6cc2da3fd385910270e46
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-14T06:24:51Z

    Revert "[SPARK-1784] Add a new partitioner to allow specifying # of keys per partition"
    
    This reverts commit 92cebada09a7e5a00ab48bcb350a9462949c33eb.

commit 6ce0884446d3571fd6e9d967a080a59c657543b1
Author: Michael Armbrust <michael@databricks.com>
Date:   2014-05-14T06:27:22Z

    [SQL] Improve column pruning.
    
    Fixed a bug that was preventing us from ever pruning beneath Joins.
    
    ## TPC-DS Q3
    ### Before:
    ```
    Aggregate false, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand_id#64 AS brand_id#0,i_brand#65
AS brand#1,SUM(PartialSum#79) AS sum_agg#2]
     Exchange (HashPartitioning [d_year#12:0,i_brand#65:1,i_brand_id#64:2], 150)
      Aggregate true, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand#65,i_brand_id#64,SUM(CAST(ss_ext_sales_price#49,
DoubleType)) AS PartialSum#79]
       Project [d_year#12:6,i_brand#65:59,i_brand_id#64:58,ss_ext_sales_price#49:43]
        HashJoin [ss_item_sk#36], [i_item_sk#57], BuildRight
         Exchange (HashPartitioning [ss_item_sk#36:30], 150)
          HashJoin [d_date_sk#6], [ss_sold_date_sk#34], BuildRight
           Exchange (HashPartitioning [d_date_sk#6:0], 150)
            Filter (d_moy#14:8 = 12)
             HiveTableScan [d_date_sk#6,d_date_id#7,d_date#8,d_month_seq#9,d_week_seq#10,d_quarter_seq#11,d_year#12,d_dow#13,d_moy#14,d_dom#15,d_qoy#16,d_fy_year#17,d_fy_quarter_seq#18,d_fy_week_seq#19,d_day_name#20,d_quarter_name#21,d_holiday#22,d_weekend#23,d_following_holiday#24,d_first_dom#25,d_last_dom#26,d_same_day_ly#27,d_same_day_lq#28,d_current_day#29,d_current_week#30,d_current_month#31,d_current_quarter#32,d_current_year#33],
(MetastoreRelation default, date_dim, Some(dt)), None
           Exchange (HashPartitioning [ss_sold_date_sk#34:0], 150)
            HiveTableScan [ss_sold_date_sk#34,ss_sold_time_sk#35,ss_item_sk#36,ss_customer_sk#37,ss_cdemo_sk#38,ss_hdemo_sk#39,ss_addr_sk#40,ss_store_sk#41,ss_promo_sk#42,ss_ticket_number#43,ss_quantity#44,ss_wholesale_cost#45,ss_list_price#46,ss_sales_price#47,ss_ext_discount_amt#48,ss_ext_sales_price#49,ss_ext_wholesale_cost#50,ss_ext_list_price#51,ss_ext_tax#52,ss_coupon_amt#53,ss_net_paid#54,ss_net_paid_inc_tax#55,ss_net_profit#56],
(MetastoreRelation default, store_sales, None), None
         Exchange (HashPartitioning [i_item_sk#57:0], 150)
          Filter (i_manufact_id#70:13 = 436)
           HiveTableScan [i_item_sk#57,i_item_id#58,i_rec_start_date#59,i_rec_end_date#60,i_item_desc#61,i_current_price#62,i_wholesale_cost#63,i_brand_id#64,i_brand#65,i_class_id#66,i_class#67,i_category_id#68,i_category#69,i_manufact_id#70,i_manufact#71,i_size#72,i_formulation#73,i_color#74,i_units#75,i_container#76,i_manager_id#77,i_product_name#78],
(MetastoreRelation default, item, None), None
    ```
    ### After
    ```
    Aggregate false, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand_id#224 AS
brand_id#160,i_brand#225 AS brand#161,SUM(PartialSum#239) AS sum_agg#162]
     Exchange (HashPartitioning [d_year#172:0,i_brand#225:1,i_brand_id#224:2], 150)
      Aggregate true, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand#225,i_brand_id#224,SUM(CAST(ss_ext_sales_price#209,
DoubleType)) AS PartialSum#239]
       Project [d_year#172:1,i_brand#225:5,i_brand_id#224:3,ss_ext_sales_price#209:0]
        HashJoin [ss_item_sk#196], [i_item_sk#217], BuildRight
         Exchange (HashPartitioning [ss_item_sk#196:2], 150)
          Project [ss_ext_sales_price#209:2,d_year#172:1,ss_item_sk#196:3]
           HashJoin [d_date_sk#166], [ss_sold_date_sk#194], BuildRight
            Exchange (HashPartitioning [d_date_sk#166:0], 150)
             Project [d_date_sk#166:0,d_year#172:1]
              Filter (d_moy#174:2 = 12)
               HiveTableScan [d_date_sk#166,d_year#172,d_moy#174], (MetastoreRelation default,
date_dim, Some(dt)), None
            Exchange (HashPartitioning [ss_sold_date_sk#194:2], 150)
             HiveTableScan [ss_ext_sales_price#209,ss_item_sk#196,ss_sold_date_sk#194], (MetastoreRelation
default, store_sales, None), None
         Exchange (HashPartitioning [i_item_sk#217:1], 150)
          Project [i_brand_id#224:0,i_item_sk#217:1,i_brand#225:2]
           Filter (i_manufact_id#230:3 = 436)
            HiveTableScan [i_brand_id#224,i_item_sk#217,i_brand#225,i_manufact_id#230], (MetastoreRelation
default, item, None), None
    ```
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #729 from marmbrus/fixPruning and squashes the following commits:
    
    5feeff0 [Michael Armbrust] Improve column pruning.

commit b22952fa1f21c0b93208846b5e1941a9d2578c6f
Author: Koert Kuipers <koert@tresata.com>
Date:   2014-05-14T07:10:12Z

    SPARK-1801. expose InterruptibleIterator and TaskKilledException in deve...
    
    ...loper api
    
    Author: Koert Kuipers <koert@tresata.com>
    
    Closes #764 from koertkuipers/feat-rdd-developerapi and squashes the following commits:
    
    8516dd2 [Koert Kuipers] SPARK-1801. expose InterruptibleIterator and TaskKilledException
in developer api

commit 54ae8328bd7d052ba347768cfb02cb5dfdd8045e
Author: Marcelo Vanzin <vanzin@cloudera.com>
Date:   2014-05-14T07:37:57Z

    Fix dep exclusion: avro-ipc, not avro, depends on netty.
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes #763 from vanzin/netty-dep-hell and squashes the following commits:
    
    dfb6ce2 [Marcelo Vanzin] Fix dep exclusion: avro-ipc, not avro, depends on netty.

commit 69f750228f3ec8537a93da08e712596fa8004143
Author: Andrew Or <andrewor14@gmail.com>
Date:   2014-05-14T07:54:33Z

    [SPARK-1769] Executor loss causes NPE race condition
    
    This PR replaces the Schedulable data structures in Pool.scala with thread-safe ones from
java. Note that Scala's `with SynchronizedBuffer` trait is soon to be deprecated in 2.11 because
it is ["inherently unreliable"](http://www.scala-lang.org/api/2.11.0/index.html#scala.collection.mutable.SynchronizedBuffer).
We should slowly drift away from `SynchronizedBuffer` in other places too.
    
    Note that this PR introduces an API-breaking change; `sc.getAllPools` now returns an Array
rather than an ArrayBuffer. This is because we want this method to return an immutable copy
rather than one may potentially confuse the user if they try to modify the copy, which takes
no effect on the original data structure.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #762 from andrewor14/pool-npe and squashes the following commits:
    
    383e739 [Andrew Or] JavaConverters -> JavaConversions
    3f32981 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
    769be19 [Andrew Or] Assorted minor changes
    2189247 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
    05ad9e9 [Andrew Or] Fix test - contains is not the same as containsKey
    0921ea0 [Andrew Or] var -> val
    07d720c [Andrew Or] Synchronize Schedulable data structures

commit 68f28dabe9c7679be82e684385be216319beb610
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-14T11:17:32Z

    Fixed streaming examples docs to use run-example instead of spark-submit
    
    Pretty self-explanatory
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #722 from tdas/example-fix and squashes the following commits:
    
    7839979 [Tathagata Das] Minor changes.
    0673441 [Tathagata Das] Fixed java docs of java streaming example
    e687123 [Tathagata Das] Fixed scala style errors.
    9b8d112 [Tathagata Das] Fixed streaming examples docs to use run-example instead of spark-submit.

commit 2e5a7cde223c8bf6d34e46b27ac94a965441584d
Author: Sean Owen <sowen@cloudera.com>
Date:   2014-05-14T16:38:33Z

    SPARK-1827. LICENSE and NOTICE files need a refresh to contain transitive dependency info
    
    LICENSE and NOTICE policy is explained here:
    
    http://www.apache.org/dev/licensing-howto.html
    http://www.apache.org/legal/3party.html
    
    This leads to the following changes.
    
    First, this change enables two extensions to maven-shade-plugin in assembly/ that will
try to include and merge all NOTICE and LICENSE files. This can't hurt.
    
    This generates a consolidated NOTICE file that I manually added to NOTICE.
    
    Next, a list of all dependencies and their licenses was generated:
    `mvn ... license:aggregate-add-third-party`
    to create: `target/generated-sources/license/THIRD-PARTY.txt`
    
    Each dependency is listed with one or more licenses. Determine the most-compatible license
for each if there is more than one.
    
    For "unknown" license dependencies, I manually evaluateD their license. Many are actually
Apache projects or components of projects covered already. The only non-trivial one was Colt,
which has its own (compatible) license.
    
    I ignored Apache-licensed and public domain dependencies as these require no further action
(beyond NOTICE above).
    
    BSD and MIT licenses (permissive Category A licenses) are evidently supposed to be mentioned
in LICENSE, so I added a section without output from the THIRD-PARTY.txt file appropriately.
    
    Everything else, Category B licenses, are evidently mentioned in NOTICE (?) Same there.
    
    LICENSE contained some license statements for source code that is redistributed. I left
this as I think that is the right place to put it.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #770 from srowen/SPARK-1827 and squashes the following commits:
    
    a764504 [Sean Owen] Add LICENSE and NOTICE info for all transitive dependencies as of
1.0

commit d1d41ccee49a5c093cb61c791c01f64f2076b83e
Author: Andrew Ash <andrew@andrewash.com>
Date:   2014-05-14T16:45:33Z

    SPARK-1818 Freshen Mesos documentation
    
    Place more emphasis on using precompiled binary versions of Spark and Mesos
    instead of encouraging the reader to compile from source.
    
    Author: Andrew Ash <andrew@andrewash.com>
    
    Closes #756 from ash211/spark-1818 and squashes the following commits:
    
    7ef3b33 [Andrew Ash] Brief explanation of the interactions between Spark and Mesos
    e7dea8e [Andrew Ash] Add troubleshooting and debugging section
    956362d [Andrew Ash] Don't need to pass spark.executor.uri into the spark shell
    de3353b [Andrew Ash] Wrap to 100char
    7ebf6ef [Andrew Ash] Polish on the section on Mesos Master URLs
    3dcc2c1 [Andrew Ash] Use --tgz parameter of make-distribution
    41b68ed [Andrew Ash] Period at end of sentence; formatting on :5050
    8bf2c53 [Andrew Ash] Update site.MESOS_VERSIOn to match /pom.xml
    74f2040 [Andrew Ash] SPARK-1818 Freshen Mesos documentation

commit d58cb33ffa9e98a64cecea7b40ce7bfbed145079
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-14T16:51:01Z

    SPARK-1828: Created forked version of hive-exec that doesn't bundle other dependencies
    
    See https://issues.apache.org/jira/browse/SPARK-1828 for more information.
    
    This is being submitted to Jenkin's for testing. The dependency won't fully
    propagate in Maven central for a few more hours.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #767 from pwendell/hive-shaded and squashes the following commits:
    
    ea10ac5 [Patrick Wendell] SPARK-1828: Created forked version of hive-exec that doesn't
bundle other dependencies

commit 17f3075bc4aa8cbed165f7b367f70e84b1bc8db9
Author: Mark Hamstra <markhamstra@gmail.com>
Date:   2014-05-14T17:07:25Z

    [SPARK-1620] Handle uncaught exceptions in function run by Akka scheduler
    
    If the intended behavior was that uncaught exceptions thrown in functions being run by
the Akka scheduler would end up being handled by the default uncaught exception handler set
in Executor, and if that behavior is, in fact, correct, then this is a way to accomplish that.
 I'm not certain, though, that we shouldn't be doing something different to handle uncaught
exceptions from some of these scheduled functions.
    
    In any event, this PR covers all of the cases I comment on in [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620).
    
    Author: Mark Hamstra <markhamstra@gmail.com>
    
    Closes #622 from markhamstra/SPARK-1620 and squashes the following commits:
    
    071d193 [Mark Hamstra] refactored post-SPARK-1772
    1a6a35e [Mark Hamstra] another style fix
    d30eb94 [Mark Hamstra] scalastyle
    3573ecd [Mark Hamstra] Use wrapped try/catch in Utils.tryOrExit
    8fc0439 [Mark Hamstra] Make functions run by the Akka scheduler use Executor's UncaughtExceptionHandler

commit fde82c1549c78f1eebbb21ec34e60befbbff65f5
Author: witgo <witgo@qq.com>
Date:   2014-05-14T18:19:26Z

    Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space
    
    Author: witgo <witgo@qq.com>
    
    Closes #773 from witgo/sbt_javaOptions and squashes the following commits:
    
    26c7d38 [witgo] Improve sbt configuration

commit a3315d7f4c7584dae2ee0aa33c6ec9e97b229b48
Author: Andrew Ash <andrew@andrewash.com>
Date:   2014-05-14T19:01:14Z

    SPARK-1829 Sub-second durations shouldn't round to "0 s"
    
    As "99 ms" up to 99 ms
    As "0.1 s" from 0.1 s up to 0.9 s
    
    https://issues.apache.org/jira/browse/SPARK-1829
    
    Compare the first image to the second here: http://imgur.com/RaLEsSZ,7VTlgfo#0
    
    Author: Andrew Ash <andrew@andrewash.com>
    
    Closes #768 from ash211/spark-1829 and squashes the following commits:
    
    1c15b8e [Andrew Ash] SPARK-1829 Format sub-second durations more appropriately

commit 65533c7ec03e7eedf5cd9756822863ab6f034ec9
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-14T19:53:30Z

    SPARK-1833 - Have an empty SparkContext constructor.
    
    This is nicer than relying on new SparkContext(new SparkConf())
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #774 from pwendell/spark-context and squashes the following commits:
    
    ef9f12f [Patrick Wendell] SPARK-1833 - Have an empty SparkContext constructor.

commit 94c6c06ea13032b80610b3f54401d2ef2aa4874a
Author: Xiangrui Meng <meng@databricks.com>
Date:   2014-05-14T21:57:17Z

    [FIX] do not load defaults when testing SparkConf in pyspark
    
    The default constructor loads default properties, which can fail the test.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #775 from mengxr/pyspark-conf-fix and squashes the following commits:
    
    83ef6c4 [Xiangrui Meng] do not load defaults when testing SparkConf in pyspark

commit 601e37198b97ba52e72ac13213c391c932e97b67
Author: Jacek Laskowski <jacek@japila.pl>
Date:   2014-05-14T22:45:52Z

    String interpolation + some other small changes
    
    After having been invited to make the change in https://github.com/apache/spark/commit/6bee01dd04ef73c6b829110ebcdd622d521ea8ff#commitcomment-6284165
by @witgo.
    
    Author: Jacek Laskowski <jacek@japila.pl>
    
    Closes #748 from jaceklaskowski/sparkenv-string-interpolation and squashes the following
commits:
    
    be6ebac [Jacek Laskowski] String interpolation + some other small changes

commit e3d72a74ad007c2bf279d6a74cdaca948bdf0ddd
Author: Xiangrui Meng <meng@databricks.com>
Date:   2014-05-15T00:18:30Z

    [SPARK-1696][MLLIB] use alpha in dense dspr
    
    It doesn't affect existing code because only `alpha = 1.0` is used in the code.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #778 from mengxr/mllib-dspr-fix and squashes the following commits:
    
    a37402e [Xiangrui Meng] use alpha in dense dspr

commit 9ad096d55a3d8410f04056ebc87dbd8cba391870
Author: andrewor14 <andrewor14@gmail.com>
Date:   2014-05-15T00:54:53Z

    [Typo] propertes -> properties
    
    Author: andrewor14 <andrewor14@gmail.com>
    
    Closes #780 from andrewor14/submit-typo and squashes the following commits:
    
    e70e057 [andrewor14] propertes -> properties

commit 44165fc91a31e6293a79031c89571e139d2c5356
Author: wangfei <scnbwf@yeah.net>
Date:   2014-05-15T00:59:11Z

    [SPARK-1826] fix the head notation of package object dsl
    
    Author: wangfei <scnbwf@yeah.net>
    
    Closes #765 from scwf/dslfix and squashes the following commits:
    
    d2d1a9d [wangfei] Update package.scala
    66ff53b [wangfei] fix the head notation of package object dsl

commit 2f639957f0bf70dddf1e698aa9e26007fb58bc67
Author: Chen Chao <crazyjvm@gmail.com>
Date:   2014-05-15T01:20:20Z

    default task number misleading in several places
    
      private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
        new HashPartitioner(numPartitions)
      }
    
    it represents that the default task number in Spark Streaming relies on the variable defaultParallelism
in SparkContext, which is decided by the config property spark.default.parallelism
    
    the property "spark.default.parallelism" refers to https://github.com/apache/spark/pull/389
    
    Author: Chen Chao <crazyjvm@gmail.com>
    
    Closes #766 from CrazyJvm/patch-7 and squashes the following commits:
    
    0b7efba [Chen Chao] Update streaming-programming-guide.md
    cc5b66c [Chen Chao] default task number misleading in several places

commit ad4e60ee7e2c49c24a9972312915f7f7253c7679
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-05-15T04:13:41Z

    [SPARK-1840] SparkListenerBus prints out scary error message when terminated normally
    
    Running SparkPi example gave this error.
    ```
    Pi is roughly 3.14374
    14/05/14 18:16:19 ERROR Utils: Uncaught exception in thread SparkListenerBus
    scala.runtime.NonLocalReturnControl$mcV$sp
    ```
    This is due to the catch-all in the SparkListenerBus, which logged control throwable used
by scala system
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #783 from tdas/controlexception-fix and squashes the following commits:
    
    a466c8d [Tathagata Das] Ignored control exceptions when logging all exceptions.

commit f10de042b8e86adf51b70bae2d8589a5cbf02935
Author: Matei Zaharia <matei@databricks.com>
Date:   2014-05-15T04:45:20Z

    Add language tabs and Python version to interactive part of quick-start
    
    This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567.
I've also updated the doc to show submitting the Python application with spark-submit.
    
    Author: Matei Zaharia <matei@databricks.com>
    
    Closes #782 from mateiz/spark-1567-extra and squashes the following commits:
    
    6f8f2aa [Matei Zaharia] tweaks
    9ed9874 [Matei Zaharia] tweaks
    ae67c3e [Matei Zaharia] tweak
    b303ba3 [Matei Zaharia] tweak
    1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start
guide

commit 21570b463388194877003318317aafd842800cac
Author: Patrick Wendell <pwendell@gmail.com>
Date:   2014-05-15T05:24:04Z

    Documentation: Encourage use of reduceByKey instead of groupByKey.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #784 from pwendell/group-by-key and squashes the following commits:
    
    9b4505f [Patrick Wendell] Small fix
    6347924 [Patrick Wendell] Documentation: Encourage use of reduceByKey instead of groupByKey.

commit 46324279dae2fa803267d788f7c56b0ed643b4c8
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   2014-05-15T05:24:41Z

    Package docs
    
    This is a few changes based on the original patch by @scrapcodes.
    
    Author: Prashant Sharma <prashant.s@imaginea.com>
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #785 from pwendell/package-docs and squashes the following commits:
    
    c32b731 [Patrick Wendell] Changes based on Prashant's patch
    c0463d3 [Prashant Sharma] added eof new line
    ce8bf73 [Prashant Sharma] Added eof new line to all files.
    4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for
all packages that appear in docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message