spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Default size of a datatype in SparkSQL
Date Thu, 08 Oct 2015 17:19:27 GMT
Its purely for estimation, when guessing when its safe to do a broadcast
join.  We picked a random number that we thought was larger than the common
case (its better to over estimate to avoid OOM).

On Wed, Oct 7, 2015 at 10:11 PM, vivek bhaskar <vivekwiz3@gmail.com> wrote:

> I want to understand whats use of default size for a given datatype?
>
> Following link mention that its for internal size estimation.
>
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DataType.html
>
> Above behavior is also reflected in code where default value seems to be
> used for stats purpose only.
>
> But then we have default size of String datatype as 4096; why we went for
> this random number? Or will it also restrict size of data? Any further
> elaboration on how string datatype works will also help.
>
> Regards,
> Vivek
>
>
>

Mime
View raw message