Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28ED5D894 for ; Sat, 30 Jun 2012 20:16:44 +0000 (UTC) Received: (qmail 64707 invoked by uid 500); 30 Jun 2012 20:16:44 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 64658 invoked by uid 500); 30 Jun 2012 20:16:44 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 64650 invoked by uid 500); 30 Jun 2012 20:16:44 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 64647 invoked by uid 99); 30 Jun 2012 20:16:44 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Jun 2012 20:16:44 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 0CC691402B5 for ; Sat, 30 Jun 2012 20:16:43 +0000 (UTC) Date: Sat, 30 Jun 2012 20:16:42 +0000 (UTC) From: "Eli Reisman (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: <1060072190.75783.1341087404055.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Created] (GIRAPH-226) Proposal for per-Mapper caching of all Writable values using existing maven imports MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Eli Reisman created GIRAPH-226: ---------------------------------- Summary: Proposal for per-Mapper caching of all Writable values using existing maven imports Key: GIRAPH-226 URL: https://issues.apache.org/jira/browse/GIRAPH-226 Project: Giraph Issue Type: New Feature Reporter: Eli Reisman Priority: Minor We already import the Guava library into our Maven repo (see GIRAPH-225) I have written a _proposed_ caching system using their very efficient com.google.common.cache library. It would exist as a static singleton per-Mapper (per JVM) and would be usable by all vertices in a given partition/JVM environment by inclusion of an instance field in BasicVertex or perhaps as part of a Context etc. (something global to each JVM would do.) Through a simple API one could manipulate and "create/get" all Writable instances used by that JVM without duplicating object all the time. The net effect would be similar to the recent improvement to NullWritable, but would cover everything. Please see the patch, it does not attempt to inject this cache into its new home yet, just places the files in "lib/" for your review and comments. Experiments to come will reveal whether this is a desperately needed improvement or just a detail as far as Giraph scale-out is concerned, but if it is, here it is. One caveat (I would be happy to make the minimal changes to existing example code/tests and our web instructions) is that the API for using Writables would change slightly. All mutation and creation/aquisition of Writable instances would be via the cache.getWritable(), which is overridden to easily accept all Java types that map to Writables without any work for the user. In fact, using this API would eliminate the need to use the "new" operator with Writables in any way. Best of all, should a new user author an application without using the cache, it would be bloated (as now) memory-wise but would not break in the least. There is a better explanation in the code comments for GiraphWritableCache, the main file. One could easily upgrade this to take advantage of generics by using a Configuration object to init this cache, and borrowing its class object for Writable instantiation, but this would require more overhead within the cache itself, and doesn't save much code it turns out because you still have to concretely implement the cache loading methods with concrete type params. Although the main object contains one sub-cache for each Java-to-Writable mapping we use in Giraph/Hadoop, they are instantiated lazily and in most vertex implementations would never be instantiated for more than 1 or 2 of the possible Writables. ArrayWritable is not supported yet, I will be posting a separate JIRA about this. It turns out, ArrayWritable does not play nice with GiraphJob.run() no matter how you subclass or manipulate it, and twice now vertex implementations of mine have had to store final values in Text or some other unfortunate format to express tuples. This would make Multigraphs (as is being discussed currently in another Jira by Allessandro) impossible unless fixed. Thanks to Sean Choi for pointing me toward this (I think larger) problem. More to follow. Anyway, a quick morning grep reveals no code in Giraph is using ArrayWritables yet anyhow, so for now this doesn't affect the cache. Please look at this code, read the comments about use, and please tell me what you think. NO biggie to be if we don't use it, but again...here it is if we want it. I look forward to hearing from you. For the record, I think it would live as a static field in BasicVertex or GiraphJob, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira