incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Time to insert bulk data is very high comparing to database
Date Sun, 08 Nov 2009 13:56:28 GMT
- You’ll easily double performance by setting the log level from DEBUG
to INFO (unclear if you actually did this, so mentioning it for
completeness)
- 0.4.1 has bad default GC options. the defaults will be fixed for
0.4.2 and 0.5, but it’s easy to tweak for 0.4.1:
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox
- it doesn't look like you're doing parallel inserts.  you should have
at least a few dozen to a few hundred threads if you want to measure
throughput rather than just latency.  run the client on a machine that
is not running cassandra, since it can also use a decent amount of
CPU.
 - using batch_insert will be much faster than multiple single-column
inserts to the same row

On Sun, Nov 8, 2009 at 7:41 AM, Richard grossman <richiesgr@gmail.com> wrote:
> Hi
>
> Actually we run on amazon EC2 large instance = 7.5 GB memory and we don't
> use ESB only local disk as /mnt
> here is my code o insert the data :
>
>     public void insertChannelShow(String showId, String channelId, String
> airDate,  String duration, String title, String parentShowId, String genre,
> String price, String subtitle) throws Exception {
>         Calendar calendar = Calendar.getInstance();
>         dateFormat.setCalendar(calendar);
>         Date air = dateFormat.parse(airDate);
>         calendar.setTime(air);
>
>         String key = String.valueOf(calendar.getTimeInMillis()) + ":" +
> showId + ":" + channelId;
>
>         long timestamp = System.currentTimeMillis();
>         cassandraClient.insert("Keyspace1",
>                 key,
>                 new ColumnPath("channelShow", null,
> ("duration").getBytes("UTF-8")),
>                 duration.getBytes("UTF-8"),
>                 timestamp,
>                 ConsistencyLevel.ONE);
>         cassandraClient.insert("Keyspace1",
>                 key,
>                 new ColumnPath("channelShow", null,
> ("title").getBytes("UTF-8")),
>                 title.getBytes("UTF-8"),
>                 timestamp,
>                 ConsistencyLevel.ONE);
>         cassandraClient.insert("Keyspace1",
>                 key,
>                 new ColumnPath("channelShow", null,
> ("parentShowId").getBytes("UTF-8")),
>                 parentShowId.getBytes("UTF-8"),
>                 timestamp,
>                 ConsistencyLevel.ONE);
>         cassandraClient.insert("Keyspace1",
>                 key,
>                 new ColumnPath("channelShow", null,
> ("genre").getBytes("UTF-8")),
>                 genre.getBytes("UTF-8"),
>                 timestamp,
>                 ConsistencyLevel.ONE);
>         cassandraClient.insert("Keyspace1",
>                 key,
>                 new ColumnPath("channelShow", null,
> ("price").getBytes("UTF-8")),
>                 price.getBytes("UTF-8"),
>                 timestamp,
>                 ConsistencyLevel.ONE);
>         cassandraClient.insert("Keyspace1",
>                 key,
>                 new ColumnPath("channelShow", null,
> ("subtitle").getBytes("UTF-8")),
>                 subtitle.getBytes("UTF-8"),
>                 timestamp,
>                 ConsistencyLevel.ONE);
>     }
>
> of course I've initialized my connection like this :
>         tr = new TSocket(server, 9160);
>         TProtocol proto = new TBinaryProtocol(tr);
>         cassandraClient = new Client(proto);
>         tr.open();
>
> I've actually 2 machine on amazon EC2. 1 large from here I run the insert
> data process and cassandra. The second machine just run cassandra but it's
> on small instance just 2GB memory.
>
> Thanks
>
>
> On Sat, Nov 7, 2009 at 12:05 AM, Michael Greene <michael.greene@gmail.com>
> wrote:
>>
>> On Fri, Nov 6, 2009 at 10:54 AM, Richard grossman <richiesgr@gmail.com>
>> wrote:
>> > I know the test is not very accurate because the cassandra and oracle db
>> > doesn't run on the same hardware but there is really a big difference.
>> Do they run on comparable hardware?  Hardware specs + configuration
>> have a clear impact on Cassandra performance -- what's your
>> environment like?  This is slow even for a recent laptop though, so
>> there's probably something else wrong.
>>
>> Michael
>
>

Mime
View raw message