cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Chemouil <schemo...@gmail.com>
Subject Re: Refactoring cassandra service package
Date Tue, 03 Jun 2014 18:59:49 GMT
Brandon Williams racontait le 03/06/2014 20:00:
> Relevant: https://issues.apache.org/jira/browse/CASSANDRA-6881

Thanks for the pointer, couldn't find that issue.




> On Tue, Jun 3, 2014 at 12:59 PM, Gary Dusbabek <gdusbabek@gmail.com> wrote:
> 
>> On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil <schemouil@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm new to Cassandra and felt like exploring and hacking on the code. I
>>> was surprised to see the usage of so many mutable/global state statics
>>> all over the service package (basically global variables/singletons).
>>>
>>> While I understand it can be practical to work with singletons, and that
>>> in any case I'm not sure multi-tenant Cassandra (as in two different
>>> Cassandra instances within the same process) would make sense at all (or
>>> even work considering there is some native access going on with JNA), I
>>> find static state can easily lead to tangled 'spaghetti' code (accessing
>>> the singletons from anywhere, even where one shouldn't), and in general
>>> it ties the code to the VM instance, rather than to the class.
>>>
>>> I tried to find if it was an actual design choice, but from my
>>> understanding this is more something inherited from the early Cassandra
>>> times at Facebook. I just found this thread[1] pointing to issue
>>> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
>>> because no one took it (but still marked as open for contribution). The
>>> current code conventions also don't mention the usage of singletons
>>> except by stating:  "Do not extract interfaces (or abstract classes)
>>> unless you actually need multiple implementations of it" (switching to a
>>> "service"-style design doesn't require passing interfaces but it's
>>> highly encouraged to help testability).
>>>
>>> So, I'd like to try to make this refactoring happen and remove all (or
>>> most) mutable static state. It would be an easy way in for me in
>>> Cassandra's internals (maybe to contribute further). I think it would
>>> help testing (ability to unit test components without going to the
>>> storage for instance) and in general modernize the code. It would also
>>> make hacking on Cassandra easier because people could pick different
>>> pieces without pulling the whole thing.
>>>
>>> It would definitely break backwards compatibility with current Java code
>>> that directly embeds Cassandra / uses it as a library, but I would keep
>>> the same abstraction so the refactoring would be easy. In any case,
>>> backwards compatibility can be broken by many more changes than just
>>> refactoring, and once this is done it will be easier to deal with
>>> backwards compatibility.
>>>
>>> Obviously all ".instance" fields would be gone, and I'd try to fix
>>> potential cyclic class dependencies and generally make sure classes
>>> dependencies form a direct acyclic graph with CassandraDaemon as its
>>> root. The basic idea is to have each 'service' component require all its
>>> service dependencies in their constructor (and keeping them as a final
>>> field), rather than getting them via the global namespace (singleton
>>> instances).
>>>
>>> If I had it my way, I'd probably use a dependency injection framework,
>>> namely Dagger which is as far as I knpw the lightest Java DI framework
>>> actively developed (jointly developed by Square and Google's Java team
>>> responsible for Guice & Guava), which has a neat compile-time annotation
>>> processor that detects missing dependencies early on. It works with both
>>> Android and J2SE and is very fast, simple and light (65kB vs 710kB for
>>> Guice).
>>>
>>> So, the question is: would you guys accept such a patch? I'd rather not
>>> do the work if it has no chance of being merged upstream :).
>>>
>>
>> This has come up before. Let's face it, removing the singletons is a
>> tempting proposition.
>>
>> Several of us have been down the path of trying to do it.
>>
>> At the end of the day, here's what you'd end up with (absolutely best
>> case):
>>
>> 1. Modifying just about every class, sometimes substantially.
>> 2. A huge patch for someone else to review.
>> 3. No performance gains, no bug fixes.  In fact, since so many classes have
>> to be changed, I'd say that the risk of introducing a bug/regression is
>> fairly likely.
>> 4. Complicated merges when bugs need to be fixed in older versions.
>> 5. More modular and testable code.
>>
>> So far, the positive aspects of 5 have not been able to trump the
>> challenges presented by 1, 2, 3, and 4.
>>
>> Kind Regards,
>>
>> Gary.
>>
>>
>>>
>>> Cheers,
>>>
>>> --
>>> Simon
>>>
>>>
>>> [1]
>>>
>>>
>> http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code
>>>
>>
> 


Mime
View raw message