cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Tuning Cassandra
Date Tue, 11 May 2010 13:33:54 GMT
Using multiple client threads (w/ pooled thrift connections) will be
even better than mutating really large chunks at a time.

On Tue, May 11, 2010 at 4:16 AM, David Boxenhorn <david@lookin2.com> wrote:
> Turns out the problem is with batch mutate. I mutate chunks 100 times
> bigger, it goes 100 times faster.
>
> Now I have a problem with running out of memory sometimes....
>
> On Mon, May 10, 2010 at 8:17 PM, B. Todd Burruss <bburruss@real.com> wrote:
>>
>> have you put your commit log on a disk by itself?  not a logical partition
>> shared by oracle or cassandra "data".  this will make a difference, as you
>> don't want the cassandra commit logs competing with other OS and oracle
>> I/O.  look in storage-conf.xml and see if you can move this.
>>
>> also check "MemtableThroughputInMB".  if you are doing a _lot_ of writes
>> you probably want to jack this up a bunch to get through the migration, then
>> put it back down for normal operation.  the default out of the box is too
>> low i believe.
>>
>> On 05/10/2010 02:05 AM, David Boxenhorn wrote:
>>
>> I read something like 80,000 rows from Oracle and write them to Cassandra
>> in chunks of 1000 rows - so I'm supposedly working to Cassandra's strength
>> and Oracle's weakness.
>>
>> Reading 1000 rows from Oracle is "instantaneous", writing them takes maybe
>> 30 seconds. Not too much data per row, maybe 1K.
>>
>>
>>
>> On Mon, May 10, 2010 at 11:48 AM, Ran Tavory <rantav@gmail.com> wrote:
>>>
>>> Hector uses tsocket. not sure what you mean by "buffered" - is that
>>> framed? Hector by default does not use framed.
>>> The code is here if you'd like to have a
>>> look http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java#L77
>>> However, I find it hard to believe that the actual connection is the
>>> slowing factor.
>>> Roughly speaking, cassandra is fast on writes and slow on reads. Exact
>>> numbers are per-scenario so it's hard to say, but if you only write and
>>> objects are small then from my experience you should expect a few k writes
>>> per second on a single host. How much do you see?
>>> There are many configuration factors and they all depend on expected
>>> usage and available h/w.
>>>
>>> On Mon, May 10, 2010 at 11:27 AM, vd <vineetdaniel@gmail.com> wrote:
>>>>
>>>> What is the complete code string you are using to connect with cassandra
>>>> from Java code
>>>>
>>>>
>>>>
>>>> On Mon, May 10, 2010 at 1:49 PM, David Boxenhorn <david@lookin2.com>
>>>> wrote:
>>>>>
>>>>> I don't know what "TSocket or the buffered one" means. Maybe I should
>>>>> know?
>>>>>
>>>>> I'm using Hector. Does that explain anything?
>>>>>
>>>>> On Mon, May 10, 2010 at 11:15 AM, vd <vineetdaniel@gmail.com> wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> what is it that you are using to connect with cassnadra TSocket or
the
>>>>>> buffered one ?
>>>>>>
>>>>>>
>>>>>> ____________________________________
>>>>>>
>>>>>> _______________________________________
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn <david@lookin2.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I'm running Java on the client, jdbc queries on Oracle, Hector
on
>>>>>>> Cassandra.
>>>>>>>
>>>>>>> The Cassandra and Oracle database designs are radically different,
as
>>>>>>> you might guess.
>>>>>>>
>>>>>>> I have no doubt that Cassandra can be tuned, in a multiple-server
>>>>>>> cluster, to have superior throughput (that's why I'm doing it!).
But for
>>>>>>> now, it's really frustrating my development effort that Cassandra
is so
>>>>>>> slow. Can't I get it up to twice as slow as Oracle in my configuration?
>>>>>>>
>>>>>>> On Mon, May 10, 2010 at 10:47 AM, vd <vineetdaniel@gmail.com>
wrote:
>>>>>>>>
>>>>>>>> Hi David
>>>>>>>>
>>>>>>>> If I may ask...how do you plan to import data from oracle
to
>>>>>>>> cassandra ?
>>>>>>>> As answer AFAIK cassandra's true ability comes into play
when
>>>>>>>> running on more than one machine...and please share how you
are making
>>>>>>>> comparisons like on writes or reads from cassandra.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________
>>>>>>>> _______________________________________
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn <david@lookin2.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I'm running Oracle and Cassandra on my machine, trying
to import my
>>>>>>>>> data to Cassandra from Oracle.
>>>>>>>>>
>>>>>>>>> In my configuration Oracle is about ten times faster
than
>>>>>>>>> Cassandra. Cassandra has out-of-the-box tuning.
>>>>>>>>>
>>>>>>>>> I am new to Cassandra. How do I begin trying to tune
it?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message