Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE763F9E7 for ; Thu, 18 Apr 2013 11:49:58 +0000 (UTC) Received: (qmail 19343 invoked by uid 500); 18 Apr 2013 11:49:54 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 19073 invoked by uid 500); 18 Apr 2013 11:49:53 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 19054 invoked by uid 99); 18 Apr 2013 11:49:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 11:49:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hadoopexplorer@outlook.com designates 65.54.190.100 as permitted sender) Received: from [65.54.190.100] (HELO bay0-omc2-s25.bay0.hotmail.com) (65.54.190.100) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 11:49:46 +0000 Received: from BAY172-W22 ([65.54.190.124]) by bay0-omc2-s25.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 18 Apr 2013 04:49:25 -0700 X-EIP: [C3LoU6cIVJcFEQLNUaST4aTWVZ1MrIHB] X-Originating-Email: [hadoopexplorer@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_78efec3a-678e-4f49-9aff-c62272c95dee_" From: Hadoop Explorer To: "user@hadoop.apache.org" Subject: will an application with two maps but no reduce be suitable for hadoop? Date: Thu, 18 Apr 2013 12:49:24 +0100 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 18 Apr 2013 11:49:25.0042 (UTC) FILETIME=[C61D3920:01CE3C2A] X-Virus-Checked: Checked by ClamAV on apache.org --_78efec3a-678e-4f49-9aff-c62272c95dee_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I have an application that evaluate a graph using this algorithm: - use a parallel for loop to evaluate all nodes in a graph (to evaluate a n= ode=2C an image is read=2C and then result of this node is calculated) - use a second parallel for loop to evaluate all edges in the graph. The f= unction would take in results from both nodes of the edge=2C and then calcu= late the answer for the edge As you can see=2C the above algorithm would employ two map functions=2C but= no reduce function. The total data size can be very large (say 100GB). A= lso=2C the workload of each node and each edge is highly irregular=2C and t= hus load balancing mechanisms are essential. In this case=2C will hadoop suit this application? if so=2C how will the a= rchitecture of my program like? And will hadoop be able to strike the bala= nce between a good load balancing of the second map function=2C and minimiz= ing data transfer of the results from the first map function? = --_78efec3a-678e-4f49-9aff-c62272c95dee_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I have an application that evalu= ate a graph using this algorithm:

- use a parallel for loop to evalu= ate all nodes in a graph (to evaluate a node=2C an image is read=2C and the= n result of this node is calculated)

- use a second parallel for loo= p to evaluate all edges in the graph. =3B The function would take in re= sults from both nodes of the edge=2C and then calculate the answer for the = edge


As you can see=2C the above algorithm would employ two map = functions=2C but no reduce function. =3B The total data size can be ver= y large (say 100GB). =3B Also=2C the workload of each node and each edg= e is highly irregular=2C and thus load balancing mechanisms are essential.<= br>
In this case=2C will hadoop suit this application? =3B if so=2C = how will the architecture of my program like? =3B And will hadoop be ab= le to strike the balance between a good load balancing of the second map fu= nction=2C and minimizing data transfer of the results from the first map fu= nction?


= --_78efec3a-678e-4f49-9aff-c62272c95dee_--