spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Buntu Dev <buntu...@gmail.com>
Subject Re: How to estimate the size of dataframe using pyspark?
Date Sat, 09 Apr 2016 23:33:51 GMT
I've allocated about 4g for the driver. For the count stage, I notice the
Shuffle Write to be 13.9 GB.

On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR <ndjido@gmail.com> wrote:

> What's the size of your driver?
> On Sat, 9 Apr 2016 at 20:33, Buntu Dev <buntudev@gmail.com> wrote:
>
>> Actually, df.show() works displaying 20 rows but df.count() is the one
>> which is causing the driver to run out of memory. There are just 3 INT
>> columns.
>>
>> Any idea what could be the reason?
>>
>> On Sat, Apr 9, 2016 at 10:47 AM, <ndjido@gmail.com> wrote:
>>
>>> You seem to have a lot of column :-) !
>>> df.count() displays the size of your data frame.
>>> df.columns.size() the number of columns.
>>>
>>> Finally, I suggest you check the size of your drive and customize it
>>> accordingly.
>>>
>>> Cheers,
>>>
>>> Ardo
>>>
>>> Sent from my iPhone
>>>
>>> > On 09 Apr 2016, at 19:37, bdev <buntudev@gmail.com> wrote:
>>> >
>>> > I keep running out of memory on the driver when I attempt to do
>>> df.show().
>>> > Can anyone let me know how to estimate the size of the dataframe?
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-estimate-the-size-of-dataframe-using-pyspark-tp26729.html
>>> > Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> > For additional commands, e-mail: user-help@spark.apache.org
>>> >
>>>
>>
>>

Mime
View raw message