hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Eng <a...@maprtech.com>
Subject Re: I/O time when reading from HDFS in Hadoop
Date Mon, 13 Jun 2016 17:08:37 GMT
If you want to measure the effect of turning compression on and off, the
most directly observable metric would be the number of bytes written.  The
actual time it takes to write data is dependent upon many factors.

On Sat, Jun 11, 2016 at 10:28 AM, Alexandru Calin <
alexandrucalin29@gmail.com> wrote:

> Hello,
> Firstly, thank you for your response.
> To be more exactly, I am interested in measuring the time between the
> following intervals: [*a*:*{CLI launch & HDFS read}*]--[*b*:*{user
> defined map/reduce}*]---[[*c*:*{writing processed data to HDFS}-end of
> job*]. I want to measure how setting compression on and off at input and
> output data will change the time between a--b--c boundries and ultimately
> the total execution time of a map reduce job (a--c). I am using standard
> benchmarks like Wordcount.
> Thanks again,
> Alex
> On Sat, Jun 11, 2016 at 7:52 PM, Daniel Schulz <
> danielschulz2005@hotmail.com> wrote:
>> Hello Alexandru,
>> So iff you are solely interested in the latencies, why not using the
>> Linux' time command from the shell. Just use the Hadoop CLI to get your
>> file, try this from several nodes from various racks for differing files
>> from your cluster and build a Confidence Interval for the time it took to
>> retrieve each file from any node & rack.
>> Otherwise, a more holistic approach was to use this project:
>> epaulson.github.io/HadoopInternals/benchmarks.html Its Ohio State
>> Infiniband benchmark contains latency information on sequential and random
>> writes on Read and Write operations and more.
>> Hope this helps…
>> Kind regards, Daniel.
>> Sent from my iPad
>> On 11 Jun 2016, at 17:22, Alexandru Calin <alexandrucalin29@gmail.com>
>> wrote:
>> Hello,
>> I would like to measure the time taken for map and reduce when performing
>> I/O (reading from HDFS) in Hadoop. I am using Yarn. Hadoop 2.6.0. What are
>> the options for that?
>> Thanks

View raw message