cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Loy <ketera...@gmail.com>
Subject Re: Suggested settings for number crunching
Date Thu, 18 Aug 2011 15:03:13 GMT
Yeah, we're processing item similarities. So we are writing single columns
at a time. Although we do batch these into 400 mutations before sending to
Cassy. We currently perform almost 2 billion calculations that then write
almost 4 billion columns.

Once all similarities are calculated, we just grab a slice per item and
create a denormalised vector of similar items (trimmed down to topN and only
those above a certain threshold). This makes lookup super fast as we only
get one column from cassandra.

So we just want to optimise the crunching and storing phase as that's a
O(n^2) complexity problem. The quicker we can make that the quicker the
whole process works.

I'm going to try disabling minor compactions as a start.

> is the loading disk or cpu or network bound?

cpu is at 40% free
only one cassy node on the same box as the processor for now so no network
traffic
so I think it's disk access. Will find out for sure tomorrow after the
current test runs.

Thanks,

Paul.

On Thu, Aug 18, 2011 at 2:23 PM, Jake Luciani <jakers@gmail.com> wrote:

> Are you writing lots of tiny rows or a few very large rows, are you
> batching mutations? is the loading disk or cpu or network bound?
>
> -Jake
>
> On Thu, Aug 18, 2011 at 7:08 AM, Paul Loy <keteracel@gmail.com> wrote:
>
>> Hi All,
>>
>> I have a program that crunches through around 3 billion calculations. We
>> store the result of each of these in cassandra to later query once in order
>> to create some vectors. Our processing is limited by Cassandra now, rather
>> than the calculations themselves.
>>
>> I was wondering what settings I can change to increase the write
>> throughput. Perhaps disabling all caching, etc, as I won't be able to keep
>> it all in memory anyway and only want to query the results once.
>>
>> Any thoughts would be appreciated,
>>
>> Paul.
>>
>> --
>> ---------------------------------------------
>> Paul Loy
>> paul@keteracel.com
>> http://uk.linkedin.com/in/paulloy
>>
>
>
>
> --
> http://twitter.com/tjake
>



-- 
---------------------------------------------
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Mime
View raw message