Hi

On Sun, Nov 8, 2009 at 3:56 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
- You’ll easily double performance by setting the log level from DEBUG
to INFO (unclear if you actually did this, so mentioning it for
completeness)
No problem I've check all is on INFO
 
- 0.4.1 has bad default GC options. the defaults will be fixed for
0.4.2 and 0.5, but it’s easy to tweak for 0.4.1:
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox
Sorry I can't find the post talking about that I can't open this link on mac os

 
- it doesn't look like you're doing parallel inserts.  you should have
at least a few dozen to a few hundred threads if you want to measure
throughput rather than just latency.  run the client on a machine that
is not running cassandra, since it can also use a decent amount of
CPU.
You mean by parallel to write a code running the insert into thread instead of one by one ?
If it's the case is the Thrift API are thread safe ?. Ho do you manage the opening and the close of the connection ? like single thread open one and closed at the end.
 
 - using batch_insert will be much faster than multiple single-column
inserts to the same row

I've made modification like this :
    public void insertChannelShow(String showId, String channelId, String airDate,  String duration, String title, String parentShowId, String genre, String price, String subtitle) throws Exception {
        Calendar calendar = Calendar.getInstance();
        dateFormat.setCalendar(calendar);
        Date air = dateFormat.parse(airDate);
        calendar.setTime(air);

        String key = String.valueOf(calendar.getTimeInMillis()) + ":" + showId + ":" + channelId;

        long timestamp = System.currentTimeMillis();
       
        Map<String, List<ColumnOrSuperColumn>> insertDataMap = new HashMap<String, List<ColumnOrSuperColumn>>();
        List<ColumnOrSuperColumn> rowData = new ArrayList<ColumnOrSuperColumn>();
       
        rowData.add(new ColumnOrSuperColumn(new Column(("duration").getBytes("UTF-8"), duration.getBytes("UTF-8"), timestamp), null));
        rowData.add(new ColumnOrSuperColumn(new Column(("title").getBytes("UTF-8"), title.getBytes("UTF-8"), timestamp), null));
        rowData.add(new ColumnOrSuperColumn(new Column(("parentShowId").getBytes("UTF-8"), parentShowId.getBytes("UTF-8"), timestamp), null));
        rowData.add(new ColumnOrSuperColumn(new Column(("genre").getBytes("UTF-8"), genre.getBytes("UTF-8"), timestamp), null));
        rowData.add(new ColumnOrSuperColumn(new Column(("price").getBytes("UTF-8"), price.getBytes("UTF-8"), timestamp), null));
        rowData.add(new ColumnOrSuperColumn(new Column(("subtitle").getBytes("UTF-8"), subtitle.getBytes("UTF-8"), timestamp), null));
       
        insertDataMap.put("channelShow", rowData);
       
        cassandraClient.batch_insert("Keyspace1", key, insertDataMap, ConsistencyLevel.ONE);
       
        insertDataMap.clear();
        rowData.clear();
        insertDataMap = null;
        rowData = null;
    }


Is it what you think about?

Anyway I've opened a new small instance in amazon to run the insert not one running cassandra and give one of the cassandra server ip. It's not improve nothing. The client machine is 1% CPU the server machines are 1% CPU.

The problem come when the data is distributed between the 2 cassandra servers because all the time the data go to commitlog of the first server all is ok ~2000 rows/second. But when the data goes to the second server it's falling very sharply ~200 rows /second.

I've read that I can check latency with JMX. it's ok but I can't succed to connect JMX agent on amazon the params are OK but nothing help the jconsole on my side refuse to connect. Is there something else I can check ?

Thanks