giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-226) Proposal for per-Mapper caching of all Writable values using existing maven imports
Date Sat, 30 Jun 2012 20:20:44 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Reisman updated GIRAPH-226:
-------------------------------

    Attachment: GIRAPH-226-1.patch
    
> Proposal for per-Mapper caching of all Writable values using existing maven imports
> -----------------------------------------------------------------------------------
>
>                 Key: GIRAPH-226
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-226
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>         Attachments: GIRAPH-226-1.patch
>
>
> We already import the Guava library into our Maven repo (see GIRAPH-225) I have written
a _proposed_ caching system using their very efficient com.google.common.cache library. It
would exist as a static singleton per-Mapper (per JVM) and would be usable by all vertices
in a given partition/JVM environment by inclusion of an instance field in BasicVertex<I,V,E,M>
or perhaps as part of a Context etc. (something global to each JVM would do.) Through a simple
API one could manipulate and "create/get" all Writable instances used by that JVM without
duplicating object all the time. The net effect would be similar to the recent improvement
to NullWritable, but would cover everything. Please see the patch, it does not attempt to
inject this cache into its new home yet, just places the files in "lib/" for your review and
comments.
> Experiments to come will reveal whether this is a desperately needed improvement or just
a detail as far as Giraph scale-out is concerned, but if it is, here it is. One caveat (I
would be happy to make the minimal changes to existing example code/tests and our web instructions)
is that the API for using Writables would change slightly. All mutation and creation/aquisition
of Writable instances would be via the cache.getWritable(), which is overridden to easily
accept all Java types that map to Writables without any work for the user. In fact, using
this API would eliminate the need to use the "new" operator with Writables in any way. Best
of all, should a new user author an application without using the cache, it would be bloated
(as now) memory-wise but would not break in the least. There is a better explanation in the
code comments for GiraphWritableCache, the main file.
> One could easily upgrade this to take advantage of generics by using a Configuration
object to init this cache, and borrowing its <I,V,E,M> class object for Writable instantiation,
but this would require more overhead within the cache itself, and doesn't save much code it
turns out because you still have to concretely implement the cache loading methods with concrete
type params. Although the main object contains one sub-cache for each Java-to-Writable mapping
we use in Giraph/Hadoop, they are instantiated lazily and in most vertex implementations would
never be instantiated for more than 1 or 2 of the possible Writables.
> ArrayWritable is not supported yet, I will be posting a separate JIRA about this. It
turns out, ArrayWritable does not play nice with GiraphJob.run() no matter how you subclass
or manipulate it, and twice now vertex implementations of mine have had to store final values
in Text or some other unfortunate format to express tuples. This would make Multigraphs (as
is being discussed currently in another Jira by Allessandro) impossible unless fixed. Thanks
to Sean Choi for pointing me toward this (I think larger) problem. More to follow.
> Anyway, a quick morning grep reveals no code in Giraph is using ArrayWritables yet anyhow,
so for now this doesn't affect the cache. Please look at this code, read the comments about
use, and please tell me what you think. NO biggie to be if we don't use it, but again...here
it is if we want it. I look forward to hearing from you. 
> For the record, I think it would live as a static field in BasicVertex<I,V,E,M>
or GiraphJob, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message