cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Loy <>
Subject Re: Suggested settings for number crunching
Date Thu, 18 Aug 2011 15:03:13 GMT
Yeah, we're processing item similarities. So we are writing single columns
at a time. Although we do batch these into 400 mutations before sending to
Cassy. We currently perform almost 2 billion calculations that then write
almost 4 billion columns.

Once all similarities are calculated, we just grab a slice per item and
create a denormalised vector of similar items (trimmed down to topN and only
those above a certain threshold). This makes lookup super fast as we only
get one column from cassandra.

So we just want to optimise the crunching and storing phase as that's a
O(n^2) complexity problem. The quicker we can make that the quicker the
whole process works.

I'm going to try disabling minor compactions as a start.

> is the loading disk or cpu or network bound?

cpu is at 40% free
only one cassy node on the same box as the processor for now so no network
so I think it's disk access. Will find out for sure tomorrow after the
current test runs.



On Thu, Aug 18, 2011 at 2:23 PM, Jake Luciani <> wrote:

> Are you writing lots of tiny rows or a few very large rows, are you
> batching mutations? is the loading disk or cpu or network bound?
> -Jake
> On Thu, Aug 18, 2011 at 7:08 AM, Paul Loy <> wrote:
>> Hi All,
>> I have a program that crunches through around 3 billion calculations. We
>> store the result of each of these in cassandra to later query once in order
>> to create some vectors. Our processing is limited by Cassandra now, rather
>> than the calculations themselves.
>> I was wondering what settings I can change to increase the write
>> throughput. Perhaps disabling all caching, etc, as I won't be able to keep
>> it all in memory anyway and only want to query the results once.
>> Any thoughts would be appreciated,
>> Paul.
>> --
>> ---------------------------------------------
>> Paul Loy
> --

Paul Loy

View raw message