spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22969) aggregateByKey with aggregator compression
Date Fri, 05 Jan 2018 14:06:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313160#comment-16313160
] 

Sean Owen commented on SPARK-22969:
-----------------------------------

Should this start as a discussion on the mailing list? doesn't seem like it's clear whether
there's a change here.

> aggregateByKey with aggregator compression
> ------------------------------------------
>
>                 Key: SPARK-22969
>                 URL: https://issues.apache.org/jira/browse/SPARK-22969
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: zhengruifeng
>            Priority: Minor
>
> I encounter a special case that the aggregator can be represented as two types:
> a) high memory-footprint, but fast {{update}}
> b) compact, but must be converted to type a before calling {{update}} and {{merge}}.
> I wonder whether it is possible to compress the fat aggregators in {{aggregateByKey}}
before shuffle, how can I impl it?  [~cloud_fan]  
> One similar case maybe:
> Using {{aggregateByKey}}/{{reduceByKey}} to compute the nnz vector (number of non-zero
value) for different keys on a large sparse dataset.
> We can use {{DenseVector}} as the aggregators to count the nnz, and then compress it
by call {{Vector#compressed}} before send it to the network.
> Another similar case maybe calling {{QuantileSummaries#compress}} before communication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message