spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghavendra Pandey <raghavendra.pan...@gmail.com>
Subject Re: StorageLevel.MEMORY_AND_DISK_SER
Date Wed, 01 Jul 2015 15:46:18 GMT
So do you want to change the behavior of persist api or write the rdd on
disk...
On Jul 1, 2015 9:13 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com> wrote:

> I think i want to use persist then and write my intermediate RDDs to
> disk+mem.
>
> On Wed, Jul 1, 2015 at 8:28 AM, Raghavendra Pandey <
> raghavendra.pandey@gmail.com> wrote:
>
>> I think persist api is internal to rdd whereas write api is for saving
>> content on dist.
>> Rdd persist will dump your obj bytes serialized on the disk.. If you
>> wanna change that behavior you need to override the class serialization
>> that your are storing in rdd..
>>  On Jul 1, 2015 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com>
wrote:
>>
>>> This is my write API. how do i integrate it here.
>>>
>>>
>>>  protected def writeOutputRecords(detailRecords:
>>> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) {
>>>     val writeJob = new Job()
>>>     val schema = SchemaUtil.outputSchema(_detail)
>>>     AvroJob.setOutputKeySchema(writeJob, schema)
>>>     val outputRecords = detailRecords.coalesce(100)
>>>     outputRecords.saveAsNewAPIHadoopFile(outputDir,
>>>       classOf[AvroKey[GenericRecord]],
>>>       classOf[org.apache.hadoop.io.NullWritable],
>>>       classOf[AvroKeyOutputFormat[GenericRecord]],
>>>       writeJob.getConfiguration)
>>>   }
>>>
>>> On Wed, Jul 1, 2015 at 8:11 AM, Koert Kuipers <koert@tresata.com> wrote:
>>>
>>>> rdd.persist(StorageLevel.MEMORY_AND_DISK_SER)
>>>>
>>>> On Wed, Jul 1, 2015 at 11:01 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>> wrote:
>>>>
>>>>> How do i persist an RDD using StorageLevel.MEMORY_AND_DISK_SER ?
>>>>>
>>>>>
>>>>> --
>>>>> Deepak
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>
>
> --
> Deepak
>
>

Mime
View raw message