spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: Default size of a datatype in SparkSQL
Date Thu, 08 Oct 2015 17:19:27 GMT
Its purely for estimation, when guessing when its safe to do a broadcast
join.  We picked a random number that we thought was larger than the common
case (its better to over estimate to avoid OOM).

On Wed, Oct 7, 2015 at 10:11 PM, vivek bhaskar <> wrote:

> I want to understand whats use of default size for a given datatype?
> Following link mention that its for internal size estimation.
> Above behavior is also reflected in code where default value seems to be
> used for stats purpose only.
> But then we have default size of String datatype as 4096; why we went for
> this random number? Or will it also restrict size of data? Any further
> elaboration on how string datatype works will also help.
> Regards,
> Vivek

View raw message