incubator-directmemory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raffaele P. Guidi" <>
Subject Initial roadmap discussion
Date Sun, 09 Oct 2011 08:44:54 GMT
Gentlemen, welcome and thank you for joining in (and the opportunity, for me
and the project, to join the ASF, which is great) . I wrote some notes about
the current state of the project and some hypothesis on future developments
which I would like to discuss with you all. These are the items I would like
to discuss (and sorry for being a bit lengthy):

   - *Design choices*
   - *New features*
   - *Integration with other products*
   - *Build, Test and Continuous integration strategy *
   - *Miscellanea*

*Design choices*
*I recently rewrote DM entirely for simplification. It used to have three
layers (heap, off-heap, file/nosql) and to authomatically push
forward/backward in the chain items according to their usage. It turned out
overly complicated and mostly inefficent at runtime (probably mostly because
of my poor implementation). The singleton facade is proving simple and
effective and well refects the nature of direct memory - which cannot be
really freed. But this needs a strategy for feature and behaviour

   - *Singleton *(largely Play! inspired) *approach *- is it good?
   - *Feature and behaviour composability*  (DI and Feature injection? A
   plugin system? OSGi?)* - just let's keep things simple and developer

*New features*
Adding simple heap cache features would spread usage among those who think
that would EVENTUALLY need a huge off-heap one (I believe it's the vast
majority of our potential "customers"). Same thing for file and distributed
ones. Having both three would qualify DM as an Enterprise Ready (please
notice the capitalization ;) cache.

   - *Heap storage *- Guava already fits the requirement, of course. We
   could both use the heap as a "queue" to speed up inserts and serialize later
   and/or keep most frequently used items into the heap for speed. It's more a
   design choice than a technical one
   - *File storage *-  this would be easy to achieve with the same "index"
   strategy of the off-heap one (I believe JCS does the same)
   - *Lateral storage *(distributed or replicated) - A possible way to do
   this: *hazelcast *for map distribution and *Apache Thrift *for intra node
   communication (node a needs an item stored in node b and then asks for it).
   I'm not sure hazelcast would perform as well as Guava with multi million
   item maps, it has to be thoroughly tested for perfomance and memory
   consumption - should hazelcast not fit the performance requirement we should
   finda an alternative way to distribute/replicate the map across
nodes. *jgroups
   *with multicasting would be perfect but it's LGPL (well, JCS uses it)
   and, of course, a custom, maybe thrift based, distribution mechanism could
   be written ad-hoc

*Integration with other products*
Providing plugins, integration or just support with/for other
technologies/products would of course spread adoption. These are the first
few that pop in my mind at the moment

   - *Apache Cayenne integration* - do I need to tell why? ;)
   - *Play! Framework integration* - because I simply love play! and use it
   in other side projects whenever I need a web/mobile fron-end
   - *Memcached *(like) *integration* - DirectMemory can be seen as an
   embedded memcached and adoption its protocol would be a good fit for
   replacing it in distributed scenarios most of all when it's used by java
   - *Scala, Clojure and other jvm languages* integration - emerging
   technologies that deserve attention. Should I have 48 hours days I would use
   the other 24 to improve my scala skills and rewrite DirectMemory can with it

There are of course a lot of things that are not essential but could be

   - *HugeArrayList, FastMap*, etc... DirectMemory currently uses Guava for
   the Map and ArrayList (I know it's not thread safe but it could be really
   not required) for the Pointer's index. Evaluation of  other fast and low
   memory impact Map and List implementation could possibly bring performance
   - *Reliability improvements* - DirectMemory is fast also because it
   sacrifices reliability - is it always a good trade-off? Could we provide
   configuration or pluggable implementation for different usage scenarios,
   maybe at list for the MemoryManager? Or even transactionality?
   - *Would hadoop need off-heap* caching? (this is a good one)

*Build, Test and Continuous integration strategy *
*The overall point for DM is testing for performance with large quantities
of memory - where the minimum should be more than the average 2GB used by
web applications - the more the better.*

   - *Testing infrastructure* - I currently use an amazon machine with 16+GB
   RAM (which costs ~$1 per hour), a bit tedious and time consuming to startup
   and to deploy on (would require some scripting) and of course continuous
   performance testing is too expensive - alternatives?
   - *Branching strategy* - I don't like feature branches - I believe
   feature composability should not be done at the SCM level - (and SVN is
   probably a bit too slow for them) and don't believe in using just release
   branches. Don't know whether there's an apache standard but I usually work
   with *spike* branches (where a spike is more than a single feature and
   less than a whole release) and then publish on release branches tagging for
   events (production, distribution, etc). Does it sound good for you?
   - *Binary packaging and demo applications* - I used to provide a binary
   distribution and a simple web application to test against but it simply was
   too effort for me alone
   - *OSGi bundling* - it costs very little and can be quite useful
   - *Maven repository *- I've applied for a sonatype repo registration but
   simply didn't have enough time to complete it and I'm using a github folder
   as a repository. I guess that artifacts would naturally go in apache repos,
   from now on, right?
   - *Testing and certification* over different JVMs and OS (sun, openjdk,
   ibm, windows, linux, AIX? Solaris?)

I would say that intensive performance testing and certification would make
a solid 0.7 GA release; heap and file storages inclusion would make a pretty
good 1.0 (the distributed storage would make it incredible!)

Waiting forward for your feedback.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message