spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From swetha kasireddy <swethakasire...@gmail.com>
Subject Re: Spark streaming job filling a lot of data in local spark nodes
Date Fri, 02 Oct 2015 00:59:46 GMT
We have limited disk space. So, can we have spark.cleaner.ttl to clean up
the files? Or is there any setting that can cleanup old temp files?

On Mon, Sep 28, 2015 at 7:02 PM, Shixiong Zhu <zsxwing@gmail.com> wrote:

> These files are created by shuffle and just some temp files. They are not
> necessary for checkpointing and only stored in your local temp directory.
> They will be stored in "/tmp" by default. You can use `spark.local.dir` to
> set the path if you find your "/tmp" doesn't have enough space.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-09-29 1:04 GMT+08:00 swetha <swethakasireddy@gmail.com>:
>
>>
>> Hi,
>>
>> I see a lot of data getting filled locally as shown below from my
>> streaming
>> job. I have my checkpoint set to hdfs. But, I still see the following data
>> filling my local nodes. Any idea if I can make this stored in hdfs instead
>> of storing the data locally?
>>
>> -rw-r--r--  1        520 Sep 17 18:43 shuffle_23119_5_0.index
>> -rw-r--r--  1 180564255 Sep 17 18:43 shuffle_23129_2_0.data
>> -rw-r--r--  1 364850277 Sep 17 18:45 shuffle_23145_8_0.data
>> -rw-r--r--  1  267583750 Sep 17 18:46 shuffle_23105_4_0.data
>> -rw-r--r--  1  136178819 Sep 17 18:48 shuffle_23123_8_0.data
>> -rw-r--r--  1  159931184 Sep 17 18:48 shuffle_23167_8_0.data
>> -rw-r--r--  1        520 Sep 17 18:49 shuffle_23315_7_0.index
>> -rw-r--r--  1        520 Sep 17 18:50 shuffle_23319_3_0.index
>> -rw-r--r--  1   92240350 Sep 17 18:51 shuffle_23305_2_0.data
>> -rw-r--r--  1   40380158 Sep 17 18:51 shuffle_23323_6_0.data
>> -rw-r--r--  1  369653284 Sep 17 18:52 shuffle_23103_6_0.data
>> -rw-r--r--  1  371932812 Sep 17 18:52 shuffle_23125_6_0.data
>> -rw-r--r--  1   19857974 Sep 17 18:53 shuffle_23291_19_0.data
>> -rw-r--r--  1  55342005 Sep 17 18:53 shuffle_23305_8_0.data
>> -rw-r--r--  1   92920590 Sep 17 18:53 shuffle_23303_4_0.data
>>
>>
>> Thanks,
>> Swetha
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-job-filling-a-lot-of-data-in-local-spark-nodes-tp24846.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message