cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Reavely <simon.reav...@gmail.com>
Subject Re: read operation is slow
Date Fri, 18 Jun 2010 16:44:46 GMT
Would it perhaps be worth denormalising your data so that you can  
retrieve all rows as a single row using a key encoded with the query  
predicate?

Until we get a stored proc feature (dunno if planned) it's hard to  
avoid round trips without denormalizing/replication of data to fit  
your query paths


Simon Reavely


On Jun 11, 2010, at 9:49 PM, "caribbean410" <caribbean410@gmail.com>  
wrote:

> Thanks for the suggestion. For the test case, it is 1 key and 1  
> column. I once changed 10 to 1, as I remember there is no much  
> difference.
>
>
>
> I have 200k keys and each key is randomly generated. I will try the  
> optimized query next week. But maybe you still have to face the case  
> that each time a client just wants to query one key from db.
>
>
>
> From: Dop Sun [mailto:sunht@dopsun.com]
> Sent: Friday, June 11, 2010 6:05 PM
> To: user@cassandra.apache.org
> Subject: RE: read operation is slow
>
>
>
> And also, you are only select 1 key and 10 columns?
>
>
>
> criteria.keyList(Lists.newArrayList(userName)).columnRange 
> (nameFirst, nameFirst, 10);
>
>
>
> Then, if you have 200k keys, you have 200k Thrift calls.  If this is  
> the case, you may need to optimize the way you do the query (to  
> combine multiple keys into a single query), and to reduce the number  
> of calls.
>
>
>
> From: Dop Sun [mailto:sunht@dopsun.com]
> Sent: Saturday, June 12, 2010 8:57 AM
> To: user@cassandra.apache.org
> Subject: RE: read operation is slow
>
>
>
> You mean after you “I remove some unnecessary column family and chan 
> ge the size of rowcache and keycache, now the latency changes from 0 
> .25ms to 0.09ms. In essence 0.09ms*200k=18s.”, it still takes 400 se 
> conds to returning?
>
>
>
> From: Caribbean410 [mailto:caribbean410@gmail.com]
> Sent: Saturday, June 12, 2010 8:48 AM
> To: user@cassandra.apache.org
> Subject: Re: read operation is slow
>
>
>
> Hi, do you mean this one should not introduce much extra delay? To  
> read a record, I need select here, not sure where the extra delay  
> comes from.
>
> On Fri, Jun 11, 2010 at 5:29 PM, Dop Sun <sunht@dopsun.com> wrote:
>
> Jassandra is used here:
>
>
>
> Map<String, List<IColumn>> map = criteria.select();
>
>
>
> The select here basically is a call to Thrift API: get_range_slices
>
>
>
>
>
> From: Caribbean410 [mailto:caribbean410@gmail.com]
> Sent: Saturday, June 12, 2010 8:00 AM
>
>
> To: user@cassandra.apache.org
> Subject: Re: read operation is slow
>
>
>
> I remove some unnecessary column family and change the size of  
> rowcache and keycache, now the latency changes from 0.25ms to  
> 0.09ms. In essence 0.09ms*200k=18s. I don't know why it takes more  
> than 400s total. Here is the client code and cfstats. There are not  
> many operations here, why is the extra time so large?
>
>
>
>               long start = System.currentTimeMillis();
>               for (int j = 0; j < 1; j++) {
>                   for (int i = 0; i < numOfRecords; i++) {
>                       int n = random.nextInt(numOfRecords);
>                       ICriteria criteria = cf.createCriteria();
>                       userName = keySet[n];
>                       criteria.keyList(Lists.newArrayList 
> (userName)).columnRange(nameFirst, nameFirst, 10);
>                       Map<String, List<IColumn>> map =  
> criteria.select();
>                       List<IColumn> list = map.get(userName);
> //                      ByteArray bloc = list.get(0).getValue();
> //                      byte[] byteArrayloc = bloc.toByteArray();
> //                      loc = new String(byteArrayloc);
> //                      readBytes = readBytes + loc.length();
>                       readBytes = readBytes + blobSize;
>                   }
>               }
>
>             long finish=System.currentTimeMillis();
>
>             float totalTime=(finish-start)/1000;
>
>
> Keyspace: Keyspace1
>     Read Count: 600000
>     Read Latency: 0.09053006666666667 ms.
>     Write Count: 200000
>     Write Latency: 0.01504989 ms.
>     Pending Tasks: 0
>         Column Family: Standard2
>         SSTable count: 3
>         Space used (live): 265990358
>         Space used (total): 265990358
>         Memtable Columns Count: 2615
>         Memtable Data Size: 2667300
>         Memtable Switch Count: 3
>         Read Count: 600000
>         Read Latency: 0.091 ms.
>         Write Count: 200000
>         Write Latency: 0.015 ms.
>         Pending Tasks: 0
>         Key cache capacity: 10000000
>         Key cache size: 187465
>         Key cache hit rate: 0.0
>         Row cache capacity: 10000000
>         Row cache size: 189990
>         Row cache hit rate: 0.68335
>         Compacted row minimum size: 0
>         Compacted row maximum size: 0
>         Compacted row mean size: 0
>
> ----------------
> Keyspace: system
>     Read Count: 1
>     Read Latency: 10.954 ms.
>     Write Count: 4
>     Write Latency: 0.28075 ms.
>     Pending Tasks: 0
>         Column Family: HintsColumnFamily
>         SSTable count: 0
>         Space used (live): 0
>         Space used (total): 0
>         Memtable Columns Count: 0
>         Memtable Data Size: 0
>         Memtable Switch Count: 0
>         Read Count: 0
>         Read Latency: NaN ms.
>         Write Count: 0
>         Write Latency: NaN ms.
>         Pending Tasks: 0
>         Key cache capacity: 1
>         Key cache size: 0
>         Key cache hit rate: NaN
>         Row cache: disabled
>         Compacted row minimum size: 0
>         Compacted row maximum size: 0
>         Compacted row mean size: 0
>
>         Column Family: LocationInfo
>         SSTable count: 2
>         Space used (live): 3232
>         Space used (total): 3232
>         Memtable Columns Count: 2
>         Memtable Data Size: 46
>         Memtable Switch Count: 1
>         Read Count: 1
>         Read Latency: 10.954 ms.
>         Write Count: 4
>         Write Latency: 0.281 ms.
>         Pending Tasks: 0
>         Key cache capacity: 1
>         Key cache size: 1
>         Key cache hit rate: 0.0
>         Row cache: disabled
>         Compacted row minimum size: 0
>         Compacted row maximum size: 0
>         Compacted row mean size: 0
>
> ----------------
>
> On Fri, Jun 11, 2010 at 1:50 PM, Jonathan Ellis <jbellis@gmail.com>  
> wrote:
>
> you need to look at cfstats to see what the latency is internal to
> cassandra, vs what your client is introducing
>
> then you should probably read the comments in the configuration file
> about caching
>
>
> On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410  
> <caribbean410@gmail.com> wrote:
> >
> > Thanks Riyad.
> >
> > Right now I am just testing Cassandra on single node. The server  
> and client
> > are running on the same machine. I tried the read test again on two
> > machines, on one machine the cpu usage is around 30% most of the  
> time and
> > another is 90%.
> >
> > Pelops is one way to access Cassandra, there are also other java  
> client like
> > hector and jassandra, will these java clients have significant  
> different
> > performance?
> >
> > Also I once tried to change the storage configure file, like change
> > CommitLogDirectory and DataFileDirectory to different disks, change
> > DiskAccessMode to mmap for a 64bit machine, and change  
> ConcurrentReads from
> > 8 to 2. All of these do not change performance much.
> >
> > For other users who use different access client, like using php, c+ 
> +,
> > python, etc, if you have any experience in boosting the read  
> performance,
> > you are more than welcome to share with me. Thanks,
> >
> > On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla <rkalla@gmail.com>  
> wrote:
> >>
> >> Caribbean410,
> >>
> >> This comes up on the Redis list alot as well -- what you are  
> actually
> >> measuring is the client sending a network connection to the Cas  
> server and
> >> it replying -- so the performance numbers you are getting can  
> easily be 70%
> >> network wait time and not necessarily hardcore read/write server
> >> performance.
> >> One way to see if this is the case, run your read test, then  
> watch the CPU
> >> on the server for the Cassandra process and see if it's pegging  
> the CPU --
> >> if it's just sitting there banging between 0-10%, the you are  
> spending most
> >> of your time waiting on network i/o (open/close sockets, etc.)
> >> If you can parallelize your test to spawn say 5 threads that all  
> do the
> >> same thing, see if the performance for each thread increases  
> linearly --
> >> which would indicate Cassandra is plenty fast in your setup, you  
> just need
> >> to utilize more client threads over the network.
> >> That new Java library, Pelops by Dominic
> >> (http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/

> )
> >> has a nice intrinsic node-balancing design that could be handy IF  
> you are
> >> using multiple nodes. If you are just testing against 1 node,  
> then spawn
> >> multiple threads of your code above and see how each thread's  
> performance
> >> scales.
> >> -R
> >> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 <caribbean410@gmail.com 
> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I am testing the performance of cassandra. We write 200k records  
> to
> >>> database and each record is 1k size. Then we read these 200k  
> records.
> >>> It takes more than 400s to finish the read which is much slower  
> than
> >>> mysql (20s around). I read some discussion online and someone  
> suggest
> >>> to make multiple connections to make it faster. But I am not  
> sure how
> >>> to do it, do I need to change my storage setting file or just  
> change
> >>> the java client code?
> >>>
> >>> Here is my read code,
> >>>
> >>>                     Properties info = new Properties();
> >>>                     info.put(DriverManager.CONSISTENCY_LEVEL,
> >>>                               ConsistencyLevel.ONE.toString());
> >>>
> >>>                     IConnection connection =  
> DriverManager.getConnection(
> >>>                                 "thrift://localhost:9160", info);
> >>>
> >>>                       // 2. Get a KeySpace by name
> >>>                       IKeySpace keySpace =
> >>> connection.getKeySpace("Keyspace1");
> >>>
> >>>                       // 3. Get a ColumnFamily by name
> >>>                       IColumnFamily cf =
> >>> keySpace.getColumnFamily("Standard2");
> >>>
> >>>                       ByteArray nameFirst = ByteArray.ofASCII 
> ("first");
> >>>                       ICriteria criteria = cf.createCriteria();
> >>>                       long readBytes = 0;
> >>>                       long start = System.currentTimeMillis();
> >>>                           for (int i = 0; i < numOfRecords; i++) {
> >>>                                   int n = random.nextInt 
> (numOfRecords);
> >>>                                       userName = keySet[n];
> >>>
> >>> criteria.keyList(Lists.newArrayList(userName)).columnRange 
> (nameFirst,
> >>> nameFirst, 10);
> >>>                                       Map<String, List<IColumn>>
 
> map =
> >>> criteria.select();
> >>>                                       List<IColumn> list =
> >>> map.get(userName);
> >>>                                       ByteArray bloc =
> >>> list.get(0).getValue();
> >>>                                       byte[] byteArrayloc =
> >>> bloc.toByteArray();
> >>>                                       loc = new String 
> (byteArrayloc);
> >>> //                                    System.out.println(userName 
> +"
> >>> "+loc);
> >>>                                       readBytes = readBytes +
> >>> loc.length();
> >>>                           }
> >>>
> >>>                         long finish=System.currentTimeMillis();
> >>>
> >>> I once commented these lines
> >>>
> >>>                                       ByteArray bloc =
> >>> list.get(0).getValue();
> >>>                                       byte[] byteArrayloc =
> >>> bloc.toByteArray();
> >>>                                       loc = new String 
> (byteArrayloc);
> >>> //                                    System.out.println(userName 
> +"
> >>> "+loc);
> >>>                                       readBytes = readBytes +
> >>> loc.length();
> >>>
> >>> And the performance doesn't improve much.
> >>>
> >>> Any suggestion is welcome. Thanks,
> >
> >
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>
>
>
>

Mime
View raw message