hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dai, Jason" <jason....@intel.com>
Subject Release of HiBench 2.2 (a Hadoop benchmark suite)
Date Thu, 25 Oct 2012 12:38:28 GMT

I would like to announce the availability of HiBench 2.2 at https://github.com/intel-hadoop/hibench.
Since the release of HiBench 2.1, we have received many good feedbacks, and HiBench 2.2 provides
an update to v2.1 based on these feedbacks, including:

1)      Build automatic data generators for Nutch indexing and Bayesian classification workloads.
In HiBench 2.1 they used fixed input data set, and cannot easily scale up or down.

2)      Change the PageRank workload to the implementation contained in the Pegasus project
(http://www.cs.cmu.edu/~pegasus/). The previous PageRank workload in HiBench 2.1 comes from
Mahout 0.6 and can run into out of memory problems with large input data; and Mahout has dropped
the support for PageRank since (see MAHOUT-1049<https://issues.apache.org/jira/browse/MAHOUT-1049>).

3)      Upgrade the machine learning workloads (K-mean clustering and Bayesian classification)
to Mahout 0.7, which fixes many issues/bugs in Mahout 0.6 (that is, the version we used in
HiBench 2.1).


From: Dai, Jason
Sent: Thursday, June 14, 2012 12:27 AM
To: common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>
Subject: Open source of HiBench 2.1 (a Hadoop benchmark suite)


HiBench, a Hadoop benchmark suite constructed by Intel, is used intensively for Hadoop benchmarking,
tuning & optimizations both inside Intel and by our customers/partners. It consists of
a set of representative Hadoop programs including both micro-benchmarks and more "real world"
applications (e.g., search, machine learning and Hive queries).

We have made HiBench 2.1 available under Apache License 2.0 at https://github.com/hibench/HiBench-2.1,
and would like to get your feedbacks on how it can be further improved. BTW, please stop by
the Intel booth if you are at Hadoop summit, so that we can have more interactive discussions
on both HiBench and HiTune (our Hadoop performance analyzer open sourced at https://github.com/hitune/hitune).


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message