hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: measuring iops
Date Tue, 23 Oct 2012 23:40:04 GMT
Hi Rita,

I get a bit grumpy when I see IOPS as the primary metric with respect to HDFS.

Why?  While IOPS are actually a relevant part of the system, many use cases of HDFS are for
a *throughput oriented* workflow.  So, in the traditional M/R use cases for HDFS, you likely
will barely scratch the IOPS the system provides.

In fact, HDFS in 0.20 will create a separate TCP connection for each IOPS - that should tell
you how low random-access workflows ranked on the HDFS designs.

As a disclaimer, there are use cases (particularly HBase, and how I currently use our HDFS
install!) where IOPS are quite relevant.  Just recall that they are not the end-all, be-all
for HDFS performance measurement.  It's not the primary number I would look for!  Each install
will have their own requirements.

Brian

On Oct 23, 2012, at 6:01 PM, Rita <rmorgan466@gmail.com> wrote:

> I was curious because when a vendor (big storage company) presented they
> were offering a hadoop solution. They posted IOPS and I wasn't sure how
> they were determining this number....
> 
> 
> 
> On Tue, Oct 23, 2012 at 9:19 AM, Michael Segel <michael_segel@hotmail.com>wrote:
> 
>> You have two issues.
>> 
>> 1) You need to know the throughput in terms of data transfer between disks
>> and controller cards on the node.
>> 
>> 2) The actual network throughput of having all of the nodes talking to one
>> another as fast as they can. This will let you see your real limitations in
>> the ToR Switch's fabric.
>> 
>> Not sure why you really want to do this except to test the disk, disk
>> controller, and then networking infrastructure of your ToR and then your
>> backplane to connect multiple racks....
>> 
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Oct 23, 2012, at 7:47 AM, Ravi Prakash <ravihoo@ymail.com> wrote:
>> 
>>> Do you mean in a cluster being used by users, or as a benchmark to
>> measure the maximum?
>>> 
>>> The JMX page <nn:port>/jmx provides some interesting stats, but I'm not
>> sure they have what you want. And I'm unaware of other tools which could.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> From: Rita <rmorgan466@gmail.com>
>>> To: common-user@hadoop.apache.org; Ravi Prakash <ravihoo@ymail.com>
>>> Sent: Monday, October 22, 2012 6:46 PM
>>> Subject: Re: measuring iops
>>> 
>>> Is it possible to know how many reads and writes are occurring thru the
>>> entire cluster in a consolidated manner -- this does not include
>>> replication factors.
>>> 
>>> 
>>> On Mon, Oct 22, 2012 at 10:28 AM, Ravi Prakash <ravihoo@ymail.com>
>> wrote:
>>> 
>>>> Hi Rita,
>>>> 
>>>> SliveTest can help you measure the number of reads / writes / deletes /
>> ls
>>>> / appends per second your NameNode can handle.
>>>> 
>>>> DFSIO can be used to help you measure the amount of throughput.
>>>> 
>>>> Both these tests are actually very flexible and have a plethora of
>> options
>>>> to help you test different facets of performance. In my experience, you
>>>> actually have to be very careful and understand what the tests are doing
>>>> for the results to be sensible.
>>>> 
>>>> HTH
>>>> Ravi
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>>  From: Rita <rmorgan466@gmail.com>
>>>> To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
>>>> Sent: Monday, October 22, 2012 7:23 AM
>>>> Subject: Re: measuring iops
>>>> 
>>>> Anyone?
>>>> 
>>>> 
>>>> On Sun, Oct 21, 2012 at 8:30 AM, Rita <rmorgan466@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Was curious if there was a method to measure the total number of IOPS
>>>> (I/O
>>>>> operations per second) on a HDFS cluster.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> --- Get your facts first, then you can distort them as you please.--
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --- Get your facts first, then you can distort them as you please.--
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> --- Get your facts first, then you can distort them as you please.--
>> 
>> 
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--


Mime
View raw message