Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Stephen Henderson <stephen.henderson@cognitivematch.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thu, 18 Aug 2011 04:01:05 -0500
Subject: A few questions on row caching and read consistency ONE
Thread-Topic: A few questions on row caching and read consistency ONE
Thread-Index: AcxdgxlWuhijma+IS0OzOKH5ZlWfsA==
Message-ID: 
 <AC6CDD3FAD22434F92D66C5E6979E28005F7520A51@34093-MBX-C15.mex07a.mlsrvr.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Hi,

We're currently in the planning stage of a new project which needs a low la=
tency, persistent key/value store with a roughly 60:40 read/write split. We=
're trying to establish if Cassandra is a good fit for this and in particul=
ar what the hardware requirements would be to have the majority of rows cac=
hed in memory (other nosql platforms like Couchbase/Membase seem like a mor=
e natural fit but we're already reasonably familiar with cassandra and woul=
d rather stick with what we know if it can work).=20

If anyone could help answer/clarify the following questions it would be a g=
reat help (all assume that row-caching is enabled for the column family).

Q. If we're using read consistency ONE does the read request get sent to al=
l nodes in the replica set and the first to reply is returned (i.e. all rep=
lica nodes will then have that row in their cache), OR does the request onl=
y get sent to a single node in the replica set? If it's the latter would th=
e same node generally be used for all requests to the same key or would it =
always be a random node in the replica set? (i.e. if we have multiple reads=
 for one key in quick succession would this entail potentially multiple dis=
k lookups until all nodes in the set have been hit?).=20

Q. Related to the above, if only one node recieves the request would the cl=
ient (hector in this case) know which node to send the request to directly =
or would there be potentially one extra network hop involved (client -> ran=
dom node -> node with key).

Q. Is it possible to do a warm cache load of the most recently accessed key=
s on node startup or would we have to do this with a client app?

Q. With write consistency ANY is it correct that following a write request =
all nodes in the replica set will end up with that row in their cache, as w=
ell as on disk, once they receive the write? i.e. total cache size is (cach=
e_memory_per_node * num_nodes) / num_replicas.

Q. If the cluster only has a single column family, random partitioning and =
no secondary indexes, is there a good metric for estimating how much heap s=
pace we would need to leave aside for everything that isn't the row-cache? =
Would it be proportional to the row-cache size or fairly constant?


Thanks,
Stephen


Stephen Henderson - Lead Developer (Onsite), Cognitive Match
stephen.henderson@cognitivematch.com | http://www.cognitivematch.com