incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Scaling problems
Date Fri, 21 May 2010 14:23:23 GMT
you should check the jmx stages I posted about

On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff <isoboroff@gmail.com> wrote:
> Just an update.  I rolled the memtable size back to 128MB.  I am still
> seeing that the daemon runs for a while with reasonable heap usage, but then
> the heap climbs up to the max (6GB in this case, should be plenty) and it
> starts GCing, without much getting cleared.  The client catches lots of
> exceptions, where I wait 30 seconds and try again, with a new client if
> necessary, but it doesn't clear up.
>
> Could this be related to memory leak problems I've skimmed past on the list
> here?
>
> It can't be that I'm creating rows a bit at a time... once I stick a web
> page into two CFs, it's over and done with for this application.  I'm just
> trying to get stuff loaded.
>
> Is there a limit to how much on-disk data a Cassandra daemon can manage?  Is
> there runtime overhead associated with stuff on disk?
>
> Ian
>
> On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff <isoboroff@gmail.com> wrote:
>>
>> Excellent leads, thanks.  cassandra.in.sh has a heap of 6GB, but I didn't
>> realize that I was trying to float so many memtables.  I'll poke tomorrow
>> and report if it gets fixed.
>> Ian
>>
>> On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>>>
>>> Some possibilities:
>>>
>>> You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too
>>> small)
>>> You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show
>>> large pending ops -- large = 100s)
>>> You're creating large rows a bit at a time and Cassandra OOMs when it
>>> tries to compact (the oom should usually be in the compaction thread)
>>> You have your 5 disks each with a separate data directory, which will
>>> allow up to 12 total memtables in-flight internally, and 12*256 is too
>>> much for the heap size you have (FLUSH-WRITER-STAGE in tpstats will
>>> show large pending ops -- large = more than 2 or 3)
>>>
>>> On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff <isoboroff@gmail.com>
>>> wrote:
>>> > I hope this isn't too much of a newbie question.  I am using Cassandra
>>> > 0.6.1
>>> > on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5
>>> > data
>>> > drives.  The nodes are running HDFS to serve files within the cluster,
>>> > but
>>> > at the moment the rest of Hadoop is shut down.  I'm trying to load a
>>> > large
>>> > set of web pages (the ClueWeb collection, but more is coming) and my
>>> > Cassandra daemons keep dying.
>>> >
>>> > I'm loading the pages into a simple column family that lets me fetch
>>> > out
>>> > pages by an internal ID or by URL.  The biggest thing in the row is the
>>> > page
>>> > content, maybe 15-20k per page of raw HTML.  There aren't a lot of
>>> > columns.
>>> > I tried Thrift, Hector, and the BMT interface, and at the moment I'm
>>> > doing
>>> > batch mutations over Thrift, about 2500 pages per batch, because that
>>> > was
>>> > fastest for me in testing.
>>> >
>>> > At this point, each Cassandra node has between 500GB and 1.5TB
>>> > according to
>>> > nodetool ring.  Let's say I start the daemons up, and they all go live
>>> > after
>>> > a couple minutes of scanning the tables.  I then start my importer,
>>> > which is
>>> > a single Java process reading Clueweb bundles over HDFS, cutting them
>>> > up,
>>> > and sending the mutations to Cassandra.  I only talk to one node at a
>>> > time,
>>> > switching to a new node when I get an exception.  As the job runs over
>>> > a few
>>> > hours, the Cassandra daemons eventually fall over, either with no error
>>> > in
>>> > the log or reporting that they are out of heap.
>>> >
>>> > Each daemon is getting 6GB of RAM and has scads of disk space to play
>>> > with.
>>> > I've set the storage-conf.xml to take 256MB in a memtable before
>>> > flushing
>>> > (like the BMT case), and to do batch commit log flushes, and to not
>>> > have any
>>> > caching in the CFs.  I'm sure I must be tuning something wrong.  I
>>> > would
>>> > eventually like this Cassandra setup to serve a light request load but
>>> > over
>>> > say 50-100 TB of data.  I'd appreciate any help or advice you can
>>> > offer.
>>> >
>>> > Thanks,
>>> > Ian
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message