Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 52909 invoked from network); 15 Jan 2008 01:55:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jan 2008 01:55:10 -0000 Received: (qmail 83761 invoked by uid 500); 15 Jan 2008 01:55:00 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 83725 invoked by uid 500); 15 Jan 2008 01:55:00 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 83716 invoked by uid 99); 15 Jan 2008 01:55:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2008 17:55:00 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2008 01:54:43 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 74195D2D5 for ; Tue, 15 Jan 2008 01:54:49 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Tue, 15 Jan 2008 01:54:49 -0000 Message-ID: <20080115015449.8783.92558@eos.apache.org> Subject: [Lucene-hadoop Wiki] Trivial Update of "DataProcessingBenchmarks" by udanax X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/DataProcessingBenchmarks ------------------------------------------------------------------------------ SQL > select ipaddress, count(*) from access_log group by ipaddress order by count(*) desc limit 0,100; [[BR]]''σ ,,count. ipaddress,, (τ ,,count,, (γ ,,count(ipaddress). ipaddress,, (access_log)))'' - ||||!MySql 5.0.27 ||Hadoop-0.15.0 (commodity)||Hadoop-0.15.0 (commodity)||Hadoop-0.15.0 (High-Performance Server)|| - ||Data ||B-tree disk table (MyISAM)||Text files (access_log)||Text files (access_log)||Text files (access_log)|| - ||Machine ||1 ||40||1000||2 (* 4 processor)|| - ||Rows ||3,700,000 ||54,805,260||54,805,260||54,805,260|| - ||Results ||100 ||100||100||100|| - ||Time ||3.715 sec ||1317.19 sec||112.03 sec||1244.21 sec|| - ==== MapReduce Flow ==== * Map was used for extract the IP address of the client requesting the web page. * Reduce was used for summation. * 1 more Map/Reduce was used for sort by count. + ==== Benchmarks ==== + + ===== 1.5 GB access_log on 10 node cluster ===== + [http://wiki.apache.org/lucene-hadoop-data/attachments/DataProcessingBenchmarks/attachments/C__Users_udanax_Desktop_test-10.png] + + ||||!MySql 5.0.27 ||Hadoop-0.15.2 ||Hadoop-0.15.2 ||Hadoop-0.15.2 ||Hadoop-0.15.2 ||Hadoop-0.15.2 || + ||Data ||B-tree disk table (MyISAM)||Text files (access_log)||Text files (access_log)||Text files (access_log)||Text files (access_log)||Text files (access_log)|| + ||Machine ||1 || 2|| 4|| 6|| 8|| 10|| + ||Rows ||3,700,000 ||5,914,669||5,914,669||5,914,669||5,914,669||5,914,669|| + ||Results ||100 ||100||100||100||100||100|| + ||Time ||3.715 sec ||172.30 sec||108.01 sec||77.41 sec||66.30 sec||60.78 sec|| + - ==== MapReduce Results ==== - {{{ - ------------------------------------ - * Top 100 connector list : - +--------------+-------------------+ - | Count | Ip Address | - +--------------+-------------------+ - | 374932 | 121.165.51.179 | - | 357615 | 121.150.85.42 | - | 304878 | 211.204.83.50 | - | ... | ... | - | 72154 | 211.210.164.215 | - | 72083 | 122.44.149.231 | - | 71646 | 124.49.150.145 | - | 70915 | 211.48.70.247 | - +--------------+-------------------+ - Processing time : 112.03 sec - }}} === EM Algorithm === * Finds maximum likelihood estimates of parameters in probabilistic models.