incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Caribbean410 <caribbean...@gmail.com>
Subject Re: read operation is slow
Date Fri, 11 Jun 2010 22:29:18 GMT
This is the cfstats. Right now I use three thread to read 200k records. I
only use Keyspace1 and Column family Standard2. For other unused column
families, do I need to comment them out in storage configure file? The
latency is 0.2576ms per records, is this a regular number (we are reading
from ssd, which should much faster than normal hard drive)?

Keyspace: Keyspace1
    Read Count: 600000
    Read Latency: 0.25760798333333335 ms.
    Write Count: 200000
    Write Latency: 0.015756365 ms.
    Pending Tasks: 0
        Column Family: StandardByUUID1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Super1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Standard2
        SSTable count: 4
        Space used (live): 279466127
        Space used (total): 279466127
        Memtable Columns Count: 2615
        Memtable Data Size: 2667300
        Memtable Switch Count: 3
        Read Count: 600000
        Read Latency: NaN ms.
        Write Count: 200000
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 1
        Key cache size: 1
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Standard1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Super2
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 1
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache capacity: 200000
        Row cache size: 0
        Row cache hit rate: NaN
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

----------------
Keyspace: system
    Read Count: 1
    Read Latency: 13.205 ms.
    Write Count: 2
    Write Latency: 0.062 ms.
    Pending Tasks: 0
        Column Family: HintsColumnFamily
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 1
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: LocationInfo
        SSTable count: 3
        Space used (live): 3853
        Space used (total): 3853
        Memtable Columns Count: 2
        Memtable Data Size: 46
        Memtable Switch Count: 0
        Read Count: 1
        Read Latency: NaN ms.
        Write Count: 2
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 3
        Key cache size: 3
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

----------------


On Fri, Jun 11, 2010 at 10:50 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> you need to look at cfstats to see what the latency is internal to
> cassandra, vs what your client is introducing
>
> then you should probably read the comments in the configuration file
> about caching
>
> On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 <caribbean410@gmail.com>
> wrote:
> >
> > Thanks Riyad.
> >
> > Right now I am just testing Cassandra on single node. The server and
> client
> > are running on the same machine. I tried the read test again on two
> > machines, on one machine the cpu usage is around 30% most of the time and
> > another is 90%.
> >
> > Pelops is one way to access Cassandra, there are also other java client
> like
> > hector and jassandra, will these java clients have significant different
> > performance?
> >
> > Also I once tried to change the storage configure file, like change
> > CommitLogDirectory and DataFileDirectory to different disks, change
> > DiskAccessMode to mmap for a 64bit machine, and change ConcurrentReads
> from
> > 8 to 2. All of these do not change performance much.
> >
> > For other users who use different access client, like using php, c++,
> > python, etc, if you have any experience in boosting the read performance,
> > you are more than welcome to share with me. Thanks,
> >
> > On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla <rkalla@gmail.com> wrote:
> >>
> >> Caribbean410,
> >>
> >> This comes up on the Redis list alot as well -- what you are actually
> >> measuring is the client sending a network connection to the Cas server
> and
> >> it replying -- so the performance numbers you are getting can easily be
> 70%
> >> network wait time and not necessarily hardcore read/write server
> >> performance.
> >> One way to see if this is the case, run your read test, then watch the
> CPU
> >> on the server for the Cassandra process and see if it's pegging the CPU
> --
> >> if it's just sitting there banging between 0-10%, the you are spending
> most
> >> of your time waiting on network i/o (open/close sockets, etc.)
> >> If you can parallelize your test to spawn say 5 threads that all do the
> >> same thing, see if the performance for each thread increases linearly --
> >> which would indicate Cassandra is plenty fast in your setup, you just
> need
> >> to utilize more client threads over the network.
> >> That new Java library, Pelops by Dominic
> >> (
> http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/
> )
> >> has a nice intrinsic node-balancing design that could be handy IF you
> are
> >> using multiple nodes. If you are just testing against 1 node, then spawn
> >> multiple threads of your code above and see how each thread's
> performance
> >> scales.
> >> -R
> >> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 <caribbean410@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I am testing the performance of cassandra. We write 200k records to
> >>> database and each record is 1k size. Then we read these 200k records.
> >>> It takes more than 400s to finish the read which is much slower than
> >>> mysql (20s around). I read some discussion online and someone suggest
> >>> to make multiple connections to make it faster. But I am not sure how
> >>> to do it, do I need to change my storage setting file or just change
> >>> the java client code?
> >>>
> >>> Here is my read code,
> >>>
> >>>                     Properties info = new Properties();
> >>>                     info.put(DriverManager.CONSISTENCY_LEVEL,
> >>>                               ConsistencyLevel.ONE.toString());
> >>>
> >>>                     IConnection connection =
> DriverManager.getConnection(
> >>>                                 "thrift://localhost:9160", info);
> >>>
> >>>                       // 2. Get a KeySpace by name
> >>>                       IKeySpace keySpace =
> >>> connection.getKeySpace("Keyspace1");
> >>>
> >>>                       // 3. Get a ColumnFamily by name
> >>>                       IColumnFamily cf =
> >>> keySpace.getColumnFamily("Standard2");
> >>>
> >>>                       ByteArray nameFirst = ByteArray.ofASCII("first");
> >>>                       ICriteria criteria = cf.createCriteria();
> >>>                       long readBytes = 0;
> >>>                       long start = System.currentTimeMillis();
> >>>                           for (int i = 0; i < numOfRecords; i++) {
> >>>                                   int n = random.nextInt(numOfRecords);
> >>>                                       userName = keySet[n];
> >>>
> >>> criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst,
> >>> nameFirst, 10);
> >>>                                       Map<String, List<IColumn>>
map =
> >>> criteria.select();
> >>>                                       List<IColumn> list =
> >>> map.get(userName);
> >>>                                       ByteArray bloc =
> >>> list.get(0).getValue();
> >>>                                       byte[] byteArrayloc =
> >>> bloc.toByteArray();
> >>>                                       loc = new String(byteArrayloc);
> >>> //                                    System.out.println(userName+"
> >>> "+loc);
> >>>                                       readBytes = readBytes +
> >>> loc.length();
> >>>                           }
> >>>
> >>>                         long finish=System.currentTimeMillis();
> >>>
> >>> I once commented these lines
> >>>
> >>>                                       ByteArray bloc =
> >>> list.get(0).getValue();
> >>>                                       byte[] byteArrayloc =
> >>> bloc.toByteArray();
> >>>                                       loc = new String(byteArrayloc);
> >>> //                                    System.out.println(userName+"
> >>> "+loc);
> >>>                                       readBytes = readBytes +
> >>> loc.length();
> >>>
> >>> And the performance doesn't improve much.
> >>>
> >>> Any suggestion is welcome. Thanks,
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message