incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apoorva Gaurav <apoorva.gau...@myntra.com>
Subject optimum fetch size in datastax driver
Date Wed, 02 Apr 2014 07:27:52 GMT
Hello All,

We have a schema which can be modelled as *(studentID int, subjectID int,
marks int, PRIMARY KEY(studentID, subjectID)*. There can be ~1M studentIDs
and for each studentID there can be ~10K subjectIDs. The queries can be
using studentID and studentID-subjectID We have a 3 node (each having 24
cores) apache cassandra 2.0.4 cluster and are using datastax driver 2.0.0
to interact with it using its automatic paging feature. I've tried
various fetch
sizes varying from 100 to 10K and observed that read latency increases with
fetch size (which looks obvious). At around 10K there are a lot of errors.
Want to understand :-

   - Is there a rule of thumb for deciding on the optimum fetch size (
   *com.datastax.driver.core.Statement.setFetchSize()* ).
   - Does cassandra keeps the entire result in cache and only returns the
   rows corresponding to the fetch size or it treats subsequent as new queries
   ( *com.datastax.driver.core.**ResultSet.fetchMoreResults() *)
   - Whether the optimum fetch size depends on number of columns in CQL
   table for e.g. should fetch size in a table like *"**studentID int,
   subjectID int, marks1 int, marks2 int, marks3 int.... marksN int PRIMARY
   KEY(studentID, subjectID)"* be less than fetch size in *"studentID int,
   subjectID int, marks int, PRIMARY KEY(studentID, subjectID)"*


-- 
Thanks & Regards,
Apoorva

Mime
View raw message