incubator-directmemory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raffaele P. Guidi" <raffaelegu...@apache.org>
Subject Re: Initial roadmap discussion
Date Mon, 10 Oct 2011 13:32:36 GMT
Thanks! This is really the kind of feedback I hoped to get :)

>   - *Singleton *(largely Play! inspired) *approach *- is it good?
>I personally don't Play! and how it works... can you point me to
some javadoc, please? TIA!

Well, maybe "singleton approach" is a wrong name where the better one could
be "Facade approach". Play! approach, from an end user (end developer?) POV
 is basically "if you want to access the cache just write Cache.get() or
Cache.put() wherever you need it and you will have much what you expect from
that". Basically the Cache facade takes care of setup and configuration (the
init() method) and simplifies end user/programmers job.

Nevertheless in other use cases a non-singleton design would work better but
- keep in mind - ByteBuffer expose an allocateDirect() method but not a
freeDirect() one, so keeping things in a singleton gives warranties against
memory leaks (you can remove a bytebuffer from the heap but the allocated
direct buffer won't be freed).

I suggest keeping the existing Cache singleton facade and adding another
(not singleton) entry point for end users to let them have choice. Doing
this it would be easy to isolate behaviour (like the expiry or the LFU
eviction) and incapsulate it in modules/plugins that handles specific
responsibilities (i.e. LFU is now taken care at the Cache level while expiry
eviction is handled at the MemoryBuffer level - 2 layers below, which is
wrong).

Talking about the LFU eviction - as you can notice looking at the code
filters on the entries collection are done using JoSQL - which is easy but
damn slow compared to hard-coded for loops. I suggest to keep JoSQL for fast
prototyping (or end-user customization) but, once requirements are
consolidated, to replace it with hardcoded loops (I also tried lambda4j but
is slow as well and more complicated).

> Not sure there's an ASF standard but in the ASF projects I'm involved we
never applied the branching strategy... curious  to try, not without worries

Talking about branching policies - the one I described work very well with
git but SVN is not as agile in creating and merging branches - feel free to
propose a more prudential approach.

> Yet another feature we haven'd discussed yet is supporting namespaces[1]
[...]

A good one, +1


> Anyway, I will start filling issues later for the initial codebase import
- be prepared!

Ready, set... ;)

Ciao,
    R

On Mon, Oct 10, 2011 at 11:13 AM, Simone Tripodi
<simonetripodi@apache.org>wrote:

> Amazing report Raf, congrats!!!
> I try to comment inline - apologize in advance if I didn't get you right :)
>
> >
> > *Design choices*
> > *I recently rewrote DM entirely for simplification. It used to have three
> > layers (heap, off-heap, file/nosql) and to authomatically push
> > forward/backward in the chain items according to their usage. It turned
> out
> > overly complicated and mostly inefficent at runtime (probably mostly
> because
> > of my poor implementation). The singleton facade is proving simple and
> > effective and well refects the nature of direct memory - which cannot be
> > really freed. But this needs a strategy for feature and behaviour
> > composability.*
> >
> >   - *Singleton *(largely Play! inspired) *approach *- is it good?
>
> I personally don't Play! and how it works... can you point me to some
> javadoc, please? TIA!
> As a side note... are we sure we want to have a single DM instance per
> application? I suggest to not limit users fantasy and needs... in the
> past, due to the architecture design, I had the need to access to
> different Memcached nodes inside the same app, that would maybe not
> possible with the singleton approach. But please take what I'm saying
> as a pure personal opinion!
>
> >   - *Feature and behaviour composability*  (DI and Feature injection? A
> >   plugin system? OSGi?)* - just let's keep things simple and developer
> >   friendly*
> >
>
> I am +1 for both DI (having JSR330 compatibility/integration would be
> IMHO really nice to have) and OSGi too, but NOT as part of the DM core
> APIs.
>
> My suggestion/proposal in the design is improving the actual
> modularization in order to have
>  * a 100% dependencies-less (as much as possible) core APIs;
>  * submodules for serializers (such as protostuff that you are already
> familiar);
>  * submodules for DI/IoC;
>  * making all that stuff as valid OSGi bundles would probably already
> help for the integration;
>  * submodules for second level caches;
>  * submodules for cluster replication;
>  * ...
>
> > *New features*
> > Adding simple heap cache features would spread usage among those who
> think
> > that would EVENTUALLY need a huge off-heap one (I believe it's the vast
> > majority of our potential "customers"). Same thing for file and
> distributed
> > ones. Having both three would qualify DM as an Enterprise Ready (please
> > notice the capitalization ;) cache.
> >
> >   - *Heap storage *- Guava already fits the requirement, of course. We
> >   could both use the heap as a "queue" to speed up inserts and serialize
> later
> >   and/or keep most frequently used items into the heap for speed. It's
> more a
> >   design choice than a technical one
> >   - *File storage *-  this would be easy to achieve with the same "index"
> >   strategy of the off-heap one (I believe JCS does the same)
> >   - *Lateral storage *(distributed or replicated) - A possible way to do
> >   this: *hazelcast *for map distribution and *Apache Thrift *for intra
> node
> >   communication (node a needs an item stored in node b and then asks for
> it).
> >   I'm not sure hazelcast would perform as well as Guava with multi
> million
> >   item maps, it has to be thoroughly tested for perfomance and memory
> >   consumption - should hazelcast not fit the performance requirement we
> should
> >   finda an alternative way to distribute/replicate the map across
> > nodes. *jgroups
> >   *with multicasting would be perfect but it's LGPL (well, JCS uses it)
> >   and, of course, a custom, maybe thrift based, distribution mechanism
> could
> >   be written ad-hoc
> >
>
> My personal taste is having preferably ASL2.0 licensed dependencies -
> even better if they are ASF products - but  you all know better which
> one works better, so feel free to adopt the best solution, I will
> follow the discussion as someone has all to learn :)
>
> > *Integration with other products*
> > Providing plugins, integration or just support with/for other
> > technologies/products would of course spread adoption. These are the
> first
> > few that pop in my mind at the moment
> >
> >   - *Apache Cayenne integration* - do I need to tell why? ;)
>
> +1
>
> >   - *Play! Framework integration* - because I simply love play! and use
> it
> >   in other side projects whenever I need a web/mobile fron-end
>
> +1
>
> >   - *Memcached *(like) *integration* - DirectMemory can be seen as an
> >   embedded memcached and adoption its protocol would be a good fit for
> >   replacing it in distributed scenarios most of all when it's used by
> java
> >   applications
>
> Why not, a Memcached facade that would encourage developers, that
> developed applications on top of Memcached, to easily migrate to DM.
> Nice idea! Maybe not at the top of priorities, but I like it :)
>
> >   - *Scala, Clojure and other jvm languages* integration - emerging
> >   technologies that deserve attention. Should I have 48 hours days I
> would use
> >   the other 24 to improve my scala skills and rewrite DirectMemory can
> with it
> >
>
> +1 too, I am not familiar with Scala, Clojure, Groovy, ... but
> hopefully the community will grows enough to attract also these
> languages experts to provide contributions :P
>
> >
> >
> > *Miscellanea*
> > There are of course a lot of things that are not essential but could be
> > investigated
> >
> >   - *HugeArrayList, FastMap*, etc... DirectMemory currently uses Guava
> for
> >   the Map and ArrayList (I know it's not thread safe but it could be
> really
> >   not required) for the Pointer's index. Evaluation of  other fast and
> low
> >   memory impact Map and List implementation could possibly bring
> performance
> >   improvements
>
> Don't we have anything at commons-collections that could help?
>
> >   - *Reliability improvements* - DirectMemory is fast also because it
> >   sacrifices reliability - is it always a good trade-off? Could we
> provide
> >   configuration or pluggable implementation for different usage
> scenarios,
> >   maybe at list for the MemoryManager? Or even transactionality?
>
> +1 for both I'd say. I know it requires a lot of time, but probably
> the users community would require both features...
>
> >   - *Would hadoop need off-heap* caching? (this is a good one)
> >
> > *Build, Test and Continuous integration strategy *
> > *The overall point for DM is testing for performance with large
> quantities
> > of memory - where the minimum should be more than the average 2GB used by
> > web applications - the more the better.*
> >
> >   - *Testing infrastructure* - I currently use an amazon machine with
> 16+GB
> >   RAM (which costs ~$1 per hour), a bit tedious and time consuming to
> startup
> >   and to deploy on (would require some scripting) and of course
> continuous
> >   performance testing is too expensive - alternatives?
>
> Maybe Olivier can tell about it...
>
> >   - *Branching strategy* - I don't like feature branches - I believe
> >   feature composability should not be done at the SCM level - (and SVN is
> >   probably a bit too slow for them) and don't believe in using just
> release
> >   branches. Don't know whether there's an apache standard but I usually
> work
> >   with *spike* branches (where a spike is more than a single feature and
> >   less than a whole release) and then publish on release branches tagging
> for
> >   events (production, distribution, etc). Does it sound good for you?
>
> Not sure there's an ASF standard but in the ASF projects I'm involved
> we never applied the branching strategy... curious to try, not without
> worries
>
> >   - *Binary packaging and demo applications* - I used to provide a binary
> >   distribution and a simple web application to test against but it simply
> was
> >   too effort for me alone
>
> That is something the ASF INFRA is already used to, Cocoon is a sample
> and Any23 is doing the same.
>
> >   - *OSGi bundling* - it costs very little and can be quite useful
>
> +1
>
> >   - *Maven repository *- I've applied for a sonatype repo registration
> but
> >   simply didn't have enough time to complete it and I'm using a github
> folder
> >   as a repository. I guess that artifacts would naturally go in apache
> repos,
> >   from now on, right?
>
> Sure, no Sonatype stuff required, just be prepared with your GPG key
> to sign releases.
>
> >   - *Testing and certification* over different JVMs and OS (sun, openjdk,
> >   ibm, windows, linux, AIX? Solaris?)
> >
>
> Hope Olivier can tell more about it.
>
> > *Roadmap*
> > I would say that intensive performance testing and certification would
> make
> > a solid 0.7 GA release; heap and file storages inclusion would make a
> pretty
> > good 1.0 (the distributed storage would make it incredible!)
> >
> > Waiting forward for your feedback.
> >
> > Cheers,
> >     Raffaele
> >
>
> Yet another feature we haven'd discussed yet is supporting
> namespaces[1] - Memcached does NOT support it and I found it in the
> past extremely useful.
>
> Anyway, I will start filling issues later for the initial codebase
> import - be prepared!
>
> All the best, have a nice day!
> Simo
>
> [1]
> http://code.google.com/p/memcached/wiki/NewProgrammingTricks#Namespacing
>
> http://people.apache.org/~simonetripodi/
> http://www.99soft.org/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message