Jassandra is used here:

Map<String, List<IColumn>> map =3D = criteria.select();

The select here basically is a call to Thrift API: = get_range_slices

From:= = Caribbean410 [mailto:caribbean410@gmail.com]
Sent: Saturday, June 12, 2010 8:00 AM
To: user@cassandra.apache.org
Subject: Re: read operation is slow

I remove some = unnecessary column family and change the size of rowcache and keycache, now the = latency changes from 0.25ms to 0.09ms. In essence 0.09ms*200k=3D18s. I don't = know why it takes more than 400s total. Here is the client code and cfstats. There = are not many operations here, why is the extra time so large?

              = long start =3D System.currentTimeMillis();
              for = (int j =3D 0; j < 1; j++) {
            =     for (int i =3D 0; i < numOfRecords; i++) {
            =         int n =3D random.nextInt(numOfRecords);
            =           ICriteria criteria =3D = cf.createCriteria();
            =           userName =3D keySet[n];
            =           criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, nameFirst, 10);             =                                 =
            =           Map<String, List<IColumn>> = map =3D criteria.select();
            =           List<IColumn> list =3D = map.get(userName);
//            =           ByteArray bloc =3D = list.get(0).getValue();
//            =           byte[] byteArrayloc =3D = bloc.toByteArray();
//            =           loc =3D new String(byteArrayloc);        =
//            =           readBytes =3D readBytes + = loc.length();
            =           readBytes =3D readBytes + blobSize;
            =     }
              = }
                            =
            long finish=3DSystem.currentTimeMillis();

            float totalTime=3D(finish-start)/1000;

Keyspace: Keyspace1
    Read Count: 600000
    Read Latency: 0.09053006666666667 ms.
    Write Count: 200000
    Write Latency: 0.01504989 ms.
    Pending Tasks: 0
        Column Family: Standard2
        SSTable count: 3
        Space used (live): 265990358
        Space used (total): 265990358
        Memtable Columns Count: 2615
        Memtable Data Size: 2667300
        Memtable Switch Count: 3
        Read Count: 600000
        Read Latency: 0.091 ms.
        Write Count: 200000
        Write Latency: 0.015 ms.
        Pending Tasks: 0
        Key cache capacity: 10000000
        Key cache size: 187465
        Key cache hit rate: 0.0
        Row cache capacity: 10000000
        Row cache size: 189990
        Row cache hit rate: 0.68335
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

----------------
Keyspace: system
    Read Count: 1
    Read Latency: 10.954 ms.
    Write Count: 4
    Write Latency: 0.28075 ms.
    Pending Tasks: 0
        Column Family: = HintsColumnFamily
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 1
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: LocationInfo
        SSTable count: 2
        Space used (live): 3232
        Space used (total): 3232
        Memtable Columns Count: 2
        Memtable Data Size: 46
        Memtable Switch Count: 1
        Read Count: 1
        Read Latency: 10.954 ms.
        Write Count: 4
        Write Latency: 0.281 ms.
        Pending Tasks: 0
        Key cache capacity: 1
        Key cache size: 1
        Key cache hit rate: 0.0
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

----------------

On Fri, Jun 11, 2010 at 1:50 PM, Jonathan Ellis = <jbellis@gmail.com> = wrote:

you need to look at cfstats to see what the latency = is internal to
cassandra, vs what your client is introducing

then you should probably read the comments in the configuration file
about caching

On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 <caribbean410@gmail.com> = wrote:
>
> Thanks Riyad.
>
> Right now I am just testing Cassandra on single node. The server = and client
> are running on the same machine. I tried the read test again on = two
> machines, on one machine the cpu usage is around 30% most of the = time and
> another is 90%.
>
> Pelops is one way to access Cassandra, there are also other java = client like
> hector and jassandra, will these java clients have significant = different
> performance?
>
> Also I once tried to change the storage configure file, like = change
> CommitLogDirectory and DataFileDirectory to different disks, = change
> DiskAccessMode to mmap for a 64bit machine, and change = ConcurrentReads from
> 8 to 2. All of these do not change performance much.
>
> For other users who use different access client, like using php, = c++,
> python, etc, if you have any experience in boosting the read = performance,
> you are more than welcome to share with me. Thanks,
>
> On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla <rkalla@gmail.com> wrote:
>>
>> Caribbean410,
>>
>> This comes up on the Redis list alot as well -- what you are = actually
>> measuring is the client sending a network connection to the Cas = server and
>> it replying -- so the performance numbers you are getting can = easily be 70%
>> network wait time and not necessarily hardcore read/write = server
>> performance.
>> One way to see if this is the case, run your read test, then = watch the CPU
>> on the server for the Cassandra process and see if it's pegging = the CPU --
>> if it's just sitting there banging between 0-10%, the you are = spending most
>> of your time waiting on network i/o (open/close sockets, = etc.)
>> If you can parallelize your test to spawn say 5 threads that = all do the
>> same thing, see if the performance for each thread = increases linearly --
>> which would indicate Cassandra is plenty fast in your setup, = you just need
>> to utilize more client threads over the network.
>> That new Java library, Pelops by Dominic
>> (http://ria101.wordpress.com/2010/06/11/pelops-the-beaut= iful-cassandra-database-client-for-java/)
>> has a nice intrinsic node-balancing design that could be handy = IF you are
>> using multiple nodes. If you are just testing against 1 node, = then spawn
>> multiple threads of your code above and see how each thread's performance
>> scales.
>> -R
>> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 <caribbean410@gmail.com>
= >> wrote:
>>>
>>> Hello,
>>>
>>> I am testing the performance of cassandra. We write 200k = records to
>>> database and each record is 1k size. Then we read these = 200k records.
>>> It takes more than 400s to finish the read which is much = slower than
>>> mysql (20s around). I read some discussion online and = someone suggest
>>> to make multiple connections to make it faster. But I am = not sure how
>>> to do it, do I need to change my storage setting file or = just change
>>> the java client code?
>>>
>>> Here is my read code,
>>>
>>> = Properties info =3D new Properties();
>>> = info.put(DriverManager.CONSISTENCY_LEVEL,
>>> = = ConsistencyLevel.ONE.toString());
>>>
>>> = IConnection connection =3D DriverManager.getConnection(
>>> = "thrift://localhost:9160", info);
>>>
>>> = // 2. Get a KeySpace by name
>>> = IKeySpace keySpace =3D
>>> connection.getKeySpace("Keyspace1");
>>>
>>> = // 3. Get a ColumnFamily by name
>>> = IColumnFamily cf =3D
>>> keySpace.getColumnFamily("Standard2");
>>>
>>> = ByteArray nameFirst =3D = ByteArray.ofASCII("first");
>>> = ICriteria criteria =3D cf.createCriteria();
>>> = long readBytes =3D 0;
>>> = long start =3D System.currentTimeMillis();
>>> = for (int i =3D 0; i < numOfRecords; i++) = {
>>> = int n =3D random.nextInt(numOfRecords);
>>> = = userName =3D keySet[n];
>>>
>>> criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst,
= >>> nameFirst, 10);
>>> = Map<String, List<IColumn>> map =3D
>>> criteria.select();
>>> = List<IColumn> list =3D
>>> map.get(userName);
>>> = = ByteArray bloc =3D
>>> list.get(0).getValue();
>>> = = byte[] byteArrayloc =3D
>>> bloc.toByteArray();
>>> = = loc =3D new String(byteArrayloc);
>>> // = System.out.println(userName+"
>>> "+loc);
>>> = = readBytes =3D readBytes +
>>> loc.length();
>>> = }
>>>
>>> = long finish=3DSystem.currentTimeMillis();
>>>
>>> I once commented these lines
>>>
>>> = = ByteArray bloc =3D
>>> list.get(0).getValue();
>>> = = byte[] byteArrayloc =3D
>>> bloc.toByteArray();
>>> = = loc =3D new String(byteArrayloc);
>>> // = System.out.println(userName+"
>>> "+loc);
>>> = = readBytes =3D readBytes +
>>> loc.length();
>>>
>>> And the performance doesn't improve much.
>>>
>>> Any suggestion is welcome. Thanks,
>
>

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com