Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EEE7418CA6 for ; Thu, 8 Oct 2015 17:20:03 +0000 (UTC) Received: (qmail 84284 invoked by uid 500); 8 Oct 2015 17:19:57 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 84190 invoked by uid 500); 8 Oct 2015 17:19:56 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 84180 invoked by uid 99); 8 Oct 2015 17:19:56 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2015 17:19:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6184C1A252F for ; Thu, 8 Oct 2015 17:19:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.998 X-Spam-Level: ** X-Spam-Status: No, score=2.998 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Ba6ZChruBEo9 for ; Thu, 8 Oct 2015 17:19:50 +0000 (UTC) Received: from mail-lb0-f175.google.com (mail-lb0-f175.google.com [209.85.217.175]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 2FB9D21247 for ; Thu, 8 Oct 2015 17:19:47 +0000 (UTC) Received: by lbcao8 with SMTP id ao8so55357483lbc.3 for ; Thu, 08 Oct 2015 10:19:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=fgNAiv5Rl/i4diNqqyuKgttLZQtV5FQhn1DL28uMMS8=; b=aLdvHI0P/9jyq+t3N8T0oWjZ8TVtuBTpZHRkRZa/i8rZgjMNdKmZLze0QGByzZBz8M UABRvyvhzDnhjJaCvi8iVKZjLPNoehdpiLMvdBM4Sn0oocrdDDhxUZgWn36jrYhcaJya f54KKZA42BEF5o1Je2UYVgOY4rIin1Ffnad8/4qDPklXE+vbaVO9CWO/exS84/jlwrQV YSBxwSYR6joWJk6ZTF3cDvFZaHJlBNQ3oR4DFqe0r5H3CMGzvbqml2UkqvGphl43OjZ5 3hSihS94hZTk0L0d2vUOzaIJxh4zxcnA1ePL0c1THN0Eh+UPFef4+SP+v1GnfwBzRjrx A4UQ== X-Gm-Message-State: ALoCoQlGZucYE0pJCzyUv6LoakDgaQD0FBgtfjCsHho+eXuE0aVs4EiargdBGQyax/wZmjUrhfSO X-Received: by 10.112.87.69 with SMTP id v5mr4351139lbz.70.1444324786680; Thu, 08 Oct 2015 10:19:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.139.67 with HTTP; Thu, 8 Oct 2015 10:19:27 -0700 (PDT) In-Reply-To: References: From: Michael Armbrust Date: Thu, 8 Oct 2015 10:19:27 -0700 Message-ID: Subject: Re: Default size of a datatype in SparkSQL To: vivek bhaskar Cc: user Content-Type: multipart/alternative; boundary=001a113436fed9cc7005219b1117 --001a113436fed9cc7005219b1117 Content-Type: text/plain; charset=UTF-8 Its purely for estimation, when guessing when its safe to do a broadcast join. We picked a random number that we thought was larger than the common case (its better to over estimate to avoid OOM). On Wed, Oct 7, 2015 at 10:11 PM, vivek bhaskar wrote: > I want to understand whats use of default size for a given datatype? > > Following link mention that its for internal size estimation. > > https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DataType.html > > Above behavior is also reflected in code where default value seems to be > used for stats purpose only. > > But then we have default size of String datatype as 4096; why we went for > this random number? Or will it also restrict size of data? Any further > elaboration on how string datatype works will also help. > > Regards, > Vivek > > > --001a113436fed9cc7005219b1117 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Its purely for estimation, when guessing when its safe to = do a broadcast join.=C2=A0 We picked a random number that we thought was la= rger than the common case (its better to over estimate to avoid OOM).
=

On Wed, Oct 7, 20= 15 at 10:11 PM, vivek bhaskar <vivekwiz3@gmail.com> wrote:=
I want to understand wh= ats use of default size for a given datatype?

Following = link mention that its for internal size estimation.

A= bove behavior is also reflected in code where default value seems to be use= d for stats purpose only.=C2=A0

But then we have d= efault size of String datatype as 4096; why we went for this random number?= Or will it also restrict size of data? Any further elaboration on how stri= ng datatype works will also help.

Regards,
Vivek



--001a113436fed9cc7005219b1117--