spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <>
Subject [jira] [Commented] (SPARK-6235) Address various 2G limits
Date Mon, 21 May 2018 18:51:01 GMT


Imran Rashid commented on SPARK-6235:

WAL -- write-ahead-log for receiver-based streaming. This wouldn't effect a streaming source
like the KafkaDirectDstream which isn't receiver based.  It might not be that hard to fix
this, but I don't know this code that well I don't think its nearly so important.

I've also seen records larger than 2 GB.  Actually this would probably be a good thing to
support eventually as well.   But I don't think its as important; I just want to put it out
of scope here.

For task results, I mean the results sent back to the driver in an action, from each partition.
 It would be hard to imagine that working if RDD records couldn't be greater than 2GB in general;
I just thought it was worth calling out as something else I've seen users try to send back
large results.  A compelling use case might be if you're updating a statistical model in memory
in your rdd action, and you want to send back the updates in a reduce to merge the updates

> Address various 2G limits
> -------------------------
>                 Key: SPARK-6235
>                 URL:
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-6235_Design_V0.02.pdf
> An umbrella ticket to track the various 2G limit we have in Spark, due to the use of
byte arrays and ByteBuffers.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message