hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alvin Chyan <alvin.ch...@turn.com>
Subject Re: I/O time when reading from HDFS in Hadoop
Date Mon, 13 Jun 2016 17:00:13 GMT
I'd be interested in learning how to do this too? Would we have to override
RecordReader to add timing code around the I/O portion? I'd like to compare
the I/O time between running normal hadoop cluster vs. a hadoop cluster on
the cloud using the remote storage (S3) as the HDFS.

Thanks!


*Alvin Chyan*Lead Software Engineer, Data
901 Marshall St, Suite 200, Redwood City, CA 94063


turn.com <http://www.turn.com/>   |   @TurnPlatform
<https://twitter.com/@TurnPlatform>

This message is Turn Confidential, except for information included that is
already available to the public. If this message was sent to you
accidentally, please delete it.

On Sat, Jun 11, 2016 at 10:28 AM, Alexandru Calin <
alexandrucalin29@gmail.com> wrote:

> Hello,
>
> Firstly, thank you for your response.
> To be more exactly, I am interested in measuring the time between the
> following intervals: [*a*:*{CLI launch & HDFS read}*]--[*b*:*{user
> defined map/reduce}*]---[[*c*:*{writing processed data to HDFS}-end of
> job*]. I want to measure how setting compression on and off at input and
> output data will change the time between a--b--c boundries and ultimately
> the total execution time of a map reduce job (a--c). I am using standard
> benchmarks like Wordcount.
>
> Thanks again,
> Alex
>
> On Sat, Jun 11, 2016 at 7:52 PM, Daniel Schulz <
> danielschulz2005@hotmail.com> wrote:
>
>> Hello Alexandru,
>>
>> So iff you are solely interested in the latencies, why not using the
>> Linux' time command from the shell. Just use the Hadoop CLI to get your
>> file, try this from several nodes from various racks for differing files
>> from your cluster and build a Confidence Interval for the time it took to
>> retrieve each file from any node & rack.
>>
>> Otherwise, a more holistic approach was to use this project:
>> epaulson.github.io/HadoopInternals/benchmarks.html Its Ohio State
>> Infiniband benchmark contains latency information on sequential and random
>> writes on Read and Write operations and more.
>>
>> Hope this helps…
>>
>> Kind regards, Daniel.
>>
>>
>>
>> Sent from my iPad
>> On 11 Jun 2016, at 17:22, Alexandru Calin <alexandrucalin29@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I would like to measure the time taken for map and reduce when performing
>> I/O (reading from HDFS) in Hadoop. I am using Yarn. Hadoop 2.6.0. What are
>> the options for that?
>>
>> Thanks
>>
>>
>

Mime
View raw message