spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Spark on RAID
Date Tue, 08 Mar 2016 17:08:19 GMT
One issue is that RAID levels providing data replication are not necessary
since HDFS already replicates blocks on multiple nodes.

On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov <alexvk@gmail.com> wrote:

> Parallel disk IO?  But the effect should be less noticeable compared to
> Hadoop which reads/writes a lot.  Much depends on how often Spark persists
> on disk.  Depends on the specifics of the RAID controller as well.
>
> If you write to HDFS as opposed to local file system this may be a big
> factor as well.
>
> On Tue, Mar 8, 2016 at 8:34 AM, Eddie Esquivel <eduardo.esquivel@gmail.com
> > wrote:
>
>> Hello All,
>> In the Spark documentation under "Hardware Requirements" it very clearly
>> states:
>>
>> We recommend having *4-8 disks* per node, configured *without* RAID
>> (just as separate mount points)
>>
>> My question is why not raid? What is the argument\reason for not using
>> Raid?
>>
>> Thanks!
>> -Eddie
>>
>
> --
> Alex Kozlov
>

Mime
View raw message