cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominic Williams <>
Subject Re: Pelops - a new Java client library paradigm
Date Mon, 14 Jun 2010 11:54:48 GMT
Hi Riyad,

No problem. Because it is a new library, I cannot provide a large list of
production deployments. However, there are various reasons you should have
confidence in the library:-

1/ Firstly the background of Pelops is that it is being used as the basis of
a serious commercial project that makes very heavy use of Cassandra. The
project itself is best described as a social network/games venture aimed at
kids 6-13. I cannot go into commercial details because the information is
sensitive, but all I can say is that scalability is very important to this
venture, it has sufficient funds to ensure that whether or not it is
ultimately successful it will have to support complex and extensive data
processing in the context of large numbers of users, and the library has
been created and will continue to be developed on the basis that we will
suffer substantial commercial pain if it has bugs or deficiencies. I
personally wrote most of the library, and have 18 years of solid programming
experience. Every days large amounts of Cassandra code is being written here
using the library, if/where problems appear they will be immediately
reported to me and fixed with urgency. Once the venture is in production -
hopefully this is not double digits weeks away now - this will provide the
best affirmation, but until then the above will have to suffice (if anyone
else is using Pelops successfully, would be great to hear)

2/ Before going into some more technical detail, I just want to reiterate
that fundamentally Pelops is a wrapper to the Thrift API. Therefore, it does
not have particular bearing on the scalability of Cassandra systems per se.
However we do try to add value through our connection pooling and load
balancing strategy, and that is something I will explore a little more

3/ Connection pooling and load balancing: As you know, one of the features
of Pelops is that it separates data processing from lower level details like
connection pooling. One benefit of this approach is that code becomes much
more readable and less bug prone, but a really big benefit is that Pelops is
able to "lend" connections to data processing code only for the moments that
calls to Thrift are in progress. This makes it possible to perform client
load balancing by counting how many "outstanding' Thrift API calls exist to
each node, and always choosing to perform operations against the node that
has the smallest number of Thrift calls running. This is the best available
strategy available without actually knowing the CPU/memory etc load on
Cassandra nodes - which, anyway, has various pitfalls and will probably
offer only an enhancement, not an alternative system. Using this strategy
adds a little to the complexity of the connection pooling system which of
course increases the surface area for mistakes. It has been working for us,
but I do invite people to code review it and will be very happy to answer
questions and address any issues found.
In terms of how the existing connection pooling system can be improved, I
think in general it is pretty much the best optional available now, but
there is one area where I plan an improvement. At the moment, Pelops
maintains a "context" for each node it knows about in the Cassandra cluster.
Each context has a refiller thread, which creates and caches new connections
to the Cassandra node in question with the aim of ensuring a sufficient
number of free connections exist to be available for spikes in usage. You
can configure a target number of connections, a minimum number of free
connections, and a maximum number of connections through the Policy. The
area I see for improvement at the moment, is that each context only has a
single "pool refiller" thread responsible for creating new free connections
when the number falls below a low water mark. It would be better if this was
multi-threaded, since in extreme situations where the buffer was depleted
rapidly, it could be more rapidly restored (since in the synchronous model
presented by Thrift, creating new connections is a blocking operation). This
is quite a minor improvement, but I plan on addressing this shortly.
Hope this helps
Best Dominic

On 11 June 2010 16:11, Riyad Kalla <> wrote:

> Dominic,
> I like the API; reads clearly and fairly intuitive.
> I think Ian was asking about what large-scale production deployments Pelops
> has been deployed in that you could speak to -- he's trying to get a
> confidence index and I am interested as well ;)
> Best,
> Riyad
> On Fri, Jun 11, 2010 at 7:04 AM, Dominic Williams <
>> wrote:
>> Hi good question.
>> The scalability of Pelops is dependent on Cassandra, not the library
>> itself. The library aims to provide an more effective access layer on top of
>> the Thrift API.
>> The library does perform connection pooling, and you can control the size
>> of the pool and other parameters using a policy object. But connection
>> pooling itself does not increase scalability, only efficiency.
>> Hope this helps.
>> BEst, Dominic
>> On 11 June 2010 14:47, Ian Soboroff <> wrote:
>>> Sounds nice.  Can you say something about the scales at which you've used
>>> this library?  Both write and read load?  Size of clusters and size of data?
>>> Ian
>>> On Fri, Jun 11, 2010 at 9:41 AM, Dominic Williams <
>>>> wrote:
>>>> Pelops is a new high quality Java client library for Cassandra.
>>>> It has a design that:
>>>> * reveals the full power of Cassandra through an elegant "Mutator and
>>>> Selector" paradigm
>>>> * generates better, cleaner, less bug prone code
>>>> * reduces the learning curve for new users
>>>> * drives rapid application development
>>>> * encapsulates advanced pooling algorithms
>>>> An article introducing Pelops can be found at
>>>> Thanks for reading.
>>>> Best, Dominic

View raw message