cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Velikhov <pavel.velik...@gmail.com>
Subject Re: Two problems with Cassandra
Date Thu, 12 Feb 2015 09:23:01 GMT

> On Feb 12, 2015, at 12:37 AM, Robert Coli <rcoli@eventbrite.com> wrote:
> 
> On Wed, Feb 11, 2015 at 2:22 AM, Pavel Velikhov <pavel.velikhov@gmail.com <mailto:pavel.velikhov@gmail.com>>
wrote:
>   2. While trying to update the full dataset with a simple transformation (again via
python driver), single node and clustered Cassandra run out of memory no matter what settings
I try, even I put a lot of sleeps into the mix. However simpler transformations (updating
just one column, specially when there is a lot of processing overhead) work just fine.
> 
> What does a "simple transformation" mean here? Assuming a reasonable sized heap, OOM
sounds like you're trying to update a large number of large partitions in a single operation.
> 
> In general, in Cassandra, you're best off interacting with a single or small number of
partitions in any given interaction.
> 
> =Rob
> 

Hi Robert!

  Simple transformation is changing just a single column value (for I usually do it for the
whole dataset).
  But when I was running out of memory, I was reading in 5 columns and updating 3. Some of
them could be big, but I need to check and rerun this case.
  (I worked around this by dumping to files and then scanning the files and updating the database,
but this stinks!)

  I don’t quite understand the fundamentals of Cassandra - if I’m just doing one scan
with a reasonable number of columns that I fetch, and I’m updating at the same time, what’s
happening there? Why eat up so much memory and die? 
Mime
View raw message