Return-Path: X-Original-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 614989D1B for ; Wed, 18 Apr 2012 19:03:07 +0000 (UTC) Received: (qmail 58009 invoked by uid 500); 18 Apr 2012 19:03:07 -0000 Delivered-To: apmail-incubator-giraph-dev-archive@incubator.apache.org Received: (qmail 57967 invoked by uid 500); 18 Apr 2012 19:03:07 -0000 Mailing-List: contact giraph-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-dev@incubator.apache.org Delivered-To: mailing list giraph-dev@incubator.apache.org Received: (qmail 57957 invoked by uid 99); 18 Apr 2012 19:03:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Apr 2012 19:03:07 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Apr 2012 19:03:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 24C923A11A7 for ; Wed, 18 Apr 2012 19:02:40 +0000 (UTC) Date: Wed, 18 Apr 2012 19:02:40 +0000 (UTC) From: "Jakob Homan (Commented) (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: <599204358.2123.1334775760152.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1770712051.36257.1331155497631.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256836#comment-13256836 ] Jakob Homan commented on GIRAPH-153: ------------------------------------ bq. Per Keith Turner's comments in HAMA-153 would it make more sense to host this submodule on github? I've spent lots of time doing this with the Avro connector for Hive and wish I hadn't. It's quite easy for the connector code to drift from the main code and have users bear the brunt of the impact. bq. I prefer to have it with Giraph directly. Anyone else? +1. If these connectors should exist (and I think they should), they should work all the time and be maintained. The best way to ensure this is to host them inside one or the other project and since Giraph would sit above HBase (or MR), we should host them. This way the connectors get tested all the time with the rest of the code. If there comes a time when we don't have the ability or support to keep them maintained, then I'd recommend just deleting them entirely from the tree, on the assumption that releasing poorly maintained, non-compatible or buggy code is worse than no code at all. Of course, I doubt this will happen and instead expect we'll always have a volunteer with hbase/accumulo knowledge to keep the code up to date. > HBase/Accumulo Input and Output formats > --------------------------------------- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp > Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB > Reporter: Brian Femiano > Attachments: GIRAPH-153.patch > > > Four abstract classes that wrap their respective delegate input/output formats for > easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each > child as a non-root notification. After superstep 1, only root nodes will have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira