Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D618927C for ; Wed, 23 May 2012 10:05:43 +0000 (UTC) Received: (qmail 59491 invoked by uid 500); 23 May 2012 10:05:42 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 59272 invoked by uid 500); 23 May 2012 10:05:42 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 59202 invoked by uid 500); 23 May 2012 10:05:42 -0000 Delivered-To: apmail-incubator-hama-dev@incubator.apache.org Received: (qmail 59196 invoked by uid 99); 23 May 2012 10:05:41 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 10:05:41 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 71BC4142826 for ; Wed, 23 May 2012 10:05:41 +0000 (UTC) Date: Wed, 23 May 2012 10:05:41 +0000 (UTC) From: "Edward J. Yoon (JIRA)" To: hama-dev@incubator.apache.org Message-ID: <503813936.11074.1337767541468.JavaMail.jiratomcat@issues-vm> In-Reply-To: <2073768274.42367.1331271957304.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HAMA-531) Data re-partitioning in BSPJobClient MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HAMA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281494#comment-13281494 ] Edward J. Yoon commented on HAMA-531: ------------------------------------- {code} 12/05/23 19:03:17 DEBUG graph.GraphJobRunner: Combiner class: org.apache.hama.examples.SSSP$MinIntCombiner 12/05/23 19:03:17 DEBUG graph.GraphJobRunner: vertex class: org.apache.hama.examples.SSSP$ShortestPathVertex 12/05/23 19:03:17 ERROR bsp.BSPTask: Error running bsp setup and bsp function. java.io.IOException: org.apache.hadoop.io.Text read 31 bytes, should read 190 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2129) at org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:60) at org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:46) at org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:482) at org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:280) at org.apache.hama.graph.GraphJobRunner.setup(GraphJobRunner.java:113) at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166) at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1097) 12/05/23 19:03:17 INFO zookeeper.ZooKeeper: Session: 0x137792074af000e closed 12/05/23 19:03:17 INFO zookeeper.ClientCnxn: EventThread shut down 12/05/23 19:03:17 ERROR bsp.BSPTask: Shutting down ping service. 12/05/23 19:03:17 FATAL bsp.GroomServer: Error running child java.io.IOException: org.apache.hadoop.io.Text read 31 bytes, should read 190 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2129) at org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:60) at org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:46) at org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:482) at org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:280) at org.apache.hama.graph.GraphJobRunner.setup(GraphJobRunner.java:113) at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166) at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1097) java.io.IOException: org.apache.hadoop.io.Text read 31 bytes, should read 190 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2129) at org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:60) at org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:46) at org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:482) at org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:280) at org.apache.hama.graph.GraphJobRunner.setup(GraphJobRunner.java:113) at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166) at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1097) {code} > Data re-partitioning in BSPJobClient > ------------------------------------ > > Key: HAMA-531 > URL: https://issues.apache.org/jira/browse/HAMA-531 > Project: Hama > Issue Type: Improvement > Reporter: Edward J. Yoon > Assignee: Thomas Jungblut > Attachments: HAMA-531_1.patch, HAMA-531_2.patch, HAMA-531_final.patch > > > The re-partitioning the data is a very expensive operation. By the way, currently, we processes read/write operations sequentially using HDFS api in BSPJobClient from client-side. This causes potential too many open files error, contains HDFS overheads, and shows slow performance. > We have to find another way to re-partitioning data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira