hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Govoni <dar...@ontrenet.com>
Subject Re: Sanity check re: value of 10GbE NICs for Hadoop?
Date Tue, 28 Jun 2011 17:41:18 GMT
I see. However, Hadoop is designed to operate best with HDFS because
of its inherent striping and blocking strategy - which is tracked by Hadoop.
Going outside of that mechanism will probably yield poor results and/or
confuse Hadoop.

Just my thoughts.

On 06/28/2011 01:27 PM, Saqib Jang -- Margalla Communications wrote:
> Darren,
> Thanks, the last pt was basically about 10GbE potentially allowing the use
> of a network file system e.g. via NFS as an alternative to HDFS, the
> question
> is there any merit in this. Basically, I was exploring if the commercial
> clustered
> NAS products offer any high-availability or data management benefits for use
> with Hadoop?
>
> Saqib
>
> -----Original Message-----
> From: Darren Govoni [mailto:darren@ontrenet.com]
> Sent: Tuesday, June 28, 2011 10:21 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Sanity check re: value of 10GbE NICs for Hadoop?
>
> Hadoop, like other parallel networked computation architectures is I/O
> bound, predominantly.
> This means any increase in network bandwidth is "A Good Thing" and can have
> drastic positive effects on performance. All your points stem from this
> simple realization.
>
> Although I'm confused by your #6. Hadoop already uses a distributed file
> system. HDFS.
>
> On 06/28/2011 01:16 PM, Saqib Jang -- Margalla Communications wrote:
>> Folks,
>>
>> I've been digging into the potential benefits of using
>>
>> 10 Gigabit Ethernet (10GbE) NIC server connections for
>>
>> Hadoop and wanted to run what I've come up with
>>
>> through initial research by the list for 'sanity check'
>>
>> feedback. I'd very much appreciate your input on
>>
>> the importance (or lack of it) of the following potential benefits of
>>
>> 10GbE server connectivity as well as other thoughts regarding
>>
>> 10GbE and Hadoop (My interest is specifically in the value
>>
>> of 10GbE server connections and 10GbE switching infrastructure,
>>
>> over scenarios such as bonded 1GbE server connections with
>>
>> 10GbE switching).
>>
>>
>>
>> 1.       HDFS Data Loading. The higher throughput enabled by 10GbE
>>
>> server and switching infrastructure allows faster processing and
>>
>> distribution of data.
>>
>> 2.       Hadoop Cluster Scalability. High-performance for initial data
>> processing
>>
>> and distribution directly impacts the degree of parallelism or
>> scalability supported
>>
>> by the cluster.
>>
>> 3.       HDFS Replication. Higher speed server connections allows faster
>> file replication.
>>
>> 4.       Map/Reduce Shuffle Phase. Improved end-to-end throughput and
>> latency directly impact the
>>
>> shuffle phase of a data set reduction especially for tasks that are at
>> the document level
>>
>> (including large documents) and lots of metadata generated by those
>> documents as well as video analytics and images.
>>
>> 5.       Data Reporting. 10GbE server networking etwork performance can
>>
>> improve data reporting performance, especially if the Hadoop cluster
>> is running
>>
>> multiple data reductions.
>>
>> 6.       Support of Cluster File Systems.  With 10 GbE NICs, Hadoop could
> be
>> reorganized
>>
>> to use a cluster or network file system. This would allow Hadoop even
>> with its Java implementation
>>
>> to have higher performance I/O and not have to be so concerned with
>> disk drive density in the same server.
>>
>> 7.       Others?
>>
>>
>>
>>
>>
>> thanks,
>>
>> Saqib
>>
>>
>>
>> Saqib Jang
>>
>> Principal/Founder
>>
>> Margalla Communications, Inc.
>>
>> 1339 Portola Road, Woodside, CA 94062
>>
>> (650) 274 8745
>>
>> www.margallacomm.com
>>
>>
>>
>>
>>
>>
>


Mime
View raw message