hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naama Kraus" <naamakr...@gmail.com>
Subject Hadoop tracing
Date Thu, 18 Sep 2008 10:25:20 GMT

I am looking for information in the area of Hadoop tracing, instrumentation,
benchmarking and so forth.
What utilities exist ? What's their maturity? Where can I get more info
about them ?

I am curious about statistics on Hadoop behavior (per a typical workload ?
different workloads ?). I am thinking on various metrics such as -
Percentage of  time a Hadoop job spends on the various phases (map, sort &
shuffle, reduce), on I/O, network, framework execution time, user code
execution time ...
Known bottlenecks ?
And whatever else interesting statistics.

Has anyone already measured ? Any documented statistics out there ?

I already encountered various stuff like the X-trace based tracing tool from
Berkeley, Hadoop metrics API, Hadoop instrumentation API (HADOOP-3772),
Hadoop Vaidya (HADOOP-4179), gridmix benchmark.

Does anyone have an input on any of those ?
Anything else I missed ?

Thanks for any direction,

oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message