<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>user@cassandra.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/"/>
<id>http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/</id>
<updated>2013-06-20T09:29:06Z</updated>
<entry>
<title>Re: Compaction not running</title>
<author><name>aaron morton &lt;aaron@thelastpickle.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cE15D5C83-4C8D-40F0-B88C-4E74F2E91853@thelastpickle.com%3e"/>
<id>urn:uuid:%3cE15D5C83-4C8D-40F0-B88C-4E74F2E91853@thelastpickle-com%3e</id>
<updated>2013-06-20T09:27:00Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&gt; nodetool compactionstats, gives&#010;&gt; &#010;&gt;     pending tasks: 13120&#010;If there are no errors in the log, I would say this is a bug. &#010;&#010;Cheers&#010;&#010;-----------------&#010;Aaron Morton&#010;Freelance Cassandra Consultant&#010;New Zealand&#010;&#010;@aaronmorton&#010;http://www.thelastpickle.com&#010;&#010;On 19/06/2013, at 11:41 AM, Franc Carter &lt;franc.carter@sirca.org.au&gt; wrote:&#010;&#010;&gt; On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot &lt;btalbot@aeriagames.com&gt; wrote:&#010;&gt; Manual compaction for LCS doesn't really do much.  It certainly doesn't compact all those&#010;little files into bigger files.  What makes you think that compactions are not occurring?&#010;&#010;&gt; &#010;&gt; Yeah, that's what I thought, however:-&#010;&gt; &#010;&gt; nodetool compactionstats, gives&#010;&gt; &#010;&gt;     pending tasks: 13120&#010;&gt;    Active compaction remaining time :        n/a&#010;&gt; &#010;&gt; when I run nodetool compact in a loop the pending tasks goes down gradually.&#010;&gt; &#010;&gt; This node also has vastly higher latencies (x10) than the other nodes. I saw this with&#010;a previous CF than I 'manually compacted', and when the pending tasks reached low numbers&#010;(stuck on 9) then latencies were back to low milliseconds&#010;&gt; &#010;&gt; cheers&#010;&gt;  &#010;&gt; -Bryan&#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter &lt;franc.carter@sirca.org.au&gt; wrote:&#010;&gt; On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter &lt;franc.carter@sirca.org.au&gt; wrote:&#010;&gt; On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli &lt;rcoli@eventbrite.com&gt; wrote:&#010;&gt; On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter &lt;franc.carter@sirca.org.au&gt; wrote:&#010;&gt; &gt; We are running a test system with Leveled compaction on Cassandra-1.2.4.&#010;&gt; &gt; While doing an initial load of the data one of the nodes ran out of file&#010;&gt; &gt; descriptors and since then it hasn't been automatically compacting.&#010;&gt; &#010;&gt; You have (at least) two options :&#010;&gt; &#010;&gt; 1) increase file descriptors available to Cassandra with ulimit, if possible&#010;&gt; 2) increase the size of your sstables with levelled compaction, such&#010;&gt; that you have fewer of them&#010;&gt; &#010;&gt; Oops, I wasn't clear enough.&#010;&gt; &#010;&gt; I have increased the number of file descriptors and no longer have a file descriptor&#010;issue. However the node still doesn't compact automatically. If I run a 'nodetool compact'&#010;it will do a small amount of compaction and then stop. The Column Family is using LCS&#010;&gt; &#010;&gt; Any ideas on this - compaction is still not automatically running for one of my nodes&#010;&gt; &#010;&gt; thanks&#010;&gt;  &#010;&gt; &#010;&gt; cheers&#010;&gt;  &#010;&gt; &#010;&gt; =Rob&#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; -- &#010;&gt; Franc Carter | Systems architect | Sirca Ltd&#010;&gt; franc.carter@sirca.org.au | www.sirca.org.au&#010;&gt; Tel: +61 2 8355 2514 &#010;&gt; Level 4, 55 Harrington St, The Rocks NSW 2000&#010;&gt; PO Box H58, Australia Square, Sydney NSW 1215&#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; -- &#010;&gt; Franc Carter | Systems architect | Sirca Ltd&#010;&gt; franc.carter@sirca.org.au | www.sirca.org.au&#010;&gt; Tel: +61 2 8355 2514 &#010;&gt; Level 4, 55 Harrington St, The Rocks NSW 2000&#010;&gt; PO Box H58, Australia Square, Sydney NSW 1215&#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; &#010;&gt; -- &#010;&gt; Franc Carter | Systems architect | Sirca Ltd&#010;&gt; franc.carter@sirca.org.au | www.sirca.org.au&#010;&gt; Tel: +61 2 8355 2514 &#010;&gt; Level 4, 55 Harrington St, The Rocks NSW 2000&#010;&gt; PO Box H58, Australia Square, Sydney NSW 1215&#010;&gt; &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [Cassandra] Expanding a Cassandra cluster</title>
<author><name>aaron morton &lt;aaron@thelastpickle.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c60137492-5764-4D1D-97FE-8F1C0511776E@thelastpickle.com%3e"/>
<id>urn:uuid:%3c60137492-5764-4D1D-97FE-8F1C0511776E@thelastpickle-com%3e</id>
<updated>2013-06-20T09:23:54Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&gt; 1) Is there any implication in running nodetool repair immediately after bringing a new&#010;node up (before key migration process is completed) ?&#010;&gt;         Will it cause some race conditions ? Or will it result in some part of the space&#010;never be reclaimed ?&#010;Repair will only be concerned with data that the node is replica for. And cleanup is only&#010;concerned with data that the node is no longer a replica for. &#010;AFAIK they should be able to run concurrently, but I would avoid it incase there are some&#010;edge cases. &#010;&#010;&gt; 2) How can I figure out the status of key migration in Cassandra?&#010;Not sure what you mean by key migration. &#010;If you are talking about a token move or node bootstrap you can get some idea from nodetool&#010;compactionstats and nodetool netstats. &#010;&#010;FWIW I think nodetool cleanup is less aggressive than repair. Repair reads all the data and&#010;creates a hash, cleanup just reads it from one file and writes a new file dropping rows that&#010;no longer belong. It's probably uses less CPU than compaction as it does not merge row fragments.&#010;&#010;&#010;Cheers&#010;&#010;-----------------&#010;Aaron Morton&#010;Freelance Cassandra Consultant&#010;New Zealand&#010;&#010;@aaronmorton&#010;http://www.thelastpickle.com&#010;&#010;On 19/06/2013, at 10:48 AM, Emalayan Vairavanathan &lt;svemalayan@yahoo.com&gt; wrote:&#010;&#010;&gt; Thank you all.&#010;&gt; &#010;&gt; I have two more question.&#010;&gt; &#010;&gt; 1) Is there any implication in running nodetool repair immediately after bringing a new&#010;node up (before key migration process is completed) ?&#010;&gt;         Will it cause some race conditions ? Or will it result in some part of the space&#010;never be reclaimed ?&#010;&gt; &#010;&gt; 2) How can I figure out the status of key migration in Cassandra?&#010;&gt; &#010;&gt; Thank you&#010;&gt; Emalayan &#010;&gt; &#010;&gt; From: Richard Low &lt;richard@wentnet.com&gt;&#010;&gt; To: user@cassandra.apache.org; Emalayan Vairavanathan &lt;svemalayan@yahoo.com&gt; &#010;&gt; Sent: Tuesday, 18 June 2013 12:11 AM&#010;&gt; Subject: Re: [Cassandra] Expanding a Cassandra cluster&#010;&gt; &#010;&gt; On 10 June 2013 22:00, Emalayan Vairavanathan &lt;svemalayan@yahoo.com&gt; wrote:&#010;&gt; &#010;&gt;                    b) Will Cassandra automatically take care of removing obsolete keys&#010;in future ?&#010;&gt; &#010;&gt; In a future version Cassandra should automatically clean up for you:&#010;&gt; &#010;&gt; https://issues.apache.org/jira/browse/CASSANDRA-5051&#010;&gt; &#010;&gt; Right now though you have to run cleanup eventually or the space will never be reclaimed.&#010;&gt; &#010;&gt; Richard.&#010;&gt; &#010;&gt; &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: &quot;SQL&quot; Injection C* (via CQL &amp; Thrift)</title>
<author><name>aaron morton &lt;aaron@thelastpickle.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cA6EFF366-919C-4DD9-BEF4-68B96069519A@thelastpickle.com%3e"/>
<id>urn:uuid:%3cA6EFF366-919C-4DD9-BEF4-68B96069519A@thelastpickle-com%3e</id>
<updated>2013-06-20T09:15:49Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&gt; As for the thrift side (i.e. using Hector or Astyanax), anyone have a crafty way to inject&#010;something?&#010;&#010;The only thing I've ever heard of coming close was a thrift bug that allowed a malformed request&#010;to crash the server. But that was a while ago https://issues.apache.org/jira/browse/CASSANDRA-475&#010;&#010;Cheers&#010;&#010;-----------------&#010;Aaron Morton&#010;Freelance Cassandra Consultant&#010;New Zealand&#010;&#010;@aaronmorton&#010;http://www.thelastpickle.com&#010;&#010;On 19/06/2013, at 1:46 AM, Brian O'Neill &lt;bone@alumni.brown.edu&gt; wrote:&#010;&#010;&gt; &#010;&gt; Perfect.  Thanks Sylvain.  That is exactly the input I was looking for, and I agree completely.&#010;&gt; (t's easy enough to protect against)&#010;&gt; &#010;&gt; As for the thrift side (i.e. using Hector or Astyanax), anyone have a crafty way to inject&#010;something?&#010;&gt; &#010;&gt; At first glance, it doesn't appear possible, but I'm not 100% confident making that assertion.&#010;&#010;&gt; &#010;&gt; -brian&#010;&gt; &#010;&gt; ---&#010;&gt; Brian O'Neill&#010;&gt; Lead Architect, Software Development&#010;&gt; Health Market Science&#010;&gt; The Science of Better Results&#010;&gt; 2700 Horizon Drive • King of Prussia, PA • 19406&#010;&gt; M: 215.588.6024 • @boneill42  •  &#010;&gt; healthmarketscience.com&#010;&gt; &#010;&gt; This information transmitted in this email message is for the intended recipient only&#010;and may contain confidential and/or privileged material. If you received this email in error&#010;and are not the intended recipient, or the person responsible to deliver it to the intended&#010;recipient, please contact the sender at the email above and delete this email and any attachments&#010;and destroy any copies thereof. Any review, retransmission, dissemination, copying or other&#010;use of, or taking any action in reliance upon, this information by persons or entities other&#010;than the intended recipient is strictly prohibited.&#010;&gt;  &#010;&gt; &#010;&gt; &#010;&gt; From: Sylvain Lebresne &lt;sylvain@datastax.com&gt;&#010;&gt; Reply-To: &lt;user@cassandra.apache.org&gt;&#010;&gt; Date: Tuesday, June 18, 2013 8:51 AM&#010;&gt; To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Subject: Re: "SQL" Injection C* (via CQL &amp; Thrift)&#010;&gt; &#010;&gt; If you're not careful, then "CQL injection" is possible.&#010;&gt; &#010;&gt; Say you naively build you query with&#010;&gt;   "UPDATE foo SET col='" + user_input + "' WHERE key = 'k'"&#010;&gt; then if user_input is "foo' AND col2='bar", your user will have overwritten a column&#010;it shouldn't have been able to. And something equivalent in a BATCH statement could allow&#010;to overwrite/delete some random row in some random table.&#010;&gt; &#010;&gt; Now CQL being much more restricted than SQL (no subqueries, no generic transaction, ...),&#010;the extent of what you can do with a CQL injection is way smaller than in SQL. But you do&#010;have to be careful.&#010;&gt; &#010;&gt; As far as the Datastax java driver is concerned, you can fairly easily protect yourself&#010;by using either:&#010;&gt; 1) prepared statements: if the user input is a prepared variable, there is nothing the&#010;user can do (it's "equivalent" to the thrift situation).&#010;&gt; 2) using the query builder: it will escape quotes in the strings you provided, thuse&#010;avoiding injection.&#010;&gt; &#010;&gt; So I would say that injections are definitively possible if you concatenate strings too&#010;naively, but I don't think preventing them is very hard.&#010;&gt; &#010;&gt; --&#010;&gt; Sylvain&#010;&gt; &#010;&gt; &#010;&gt; On Tue, Jun 18, 2013 at 2:02 PM, Brian O'Neill &lt;bone@alumni.brown.edu&gt; wrote:&#010;&gt;&gt; &#010;&gt;&gt; Mostly for fun, I wanted to throw this out there...&#010;&gt;&gt; &#010;&gt;&gt; We are undergoing a security audit for our platform (C* + Elastic Search + Storm).&#010; One component of that audit is susceptibility to SQL injection.  I was wondering if anyone&#010;has attempted to construct a SQL injection attack against Cassandra?  Is it even possible?&#010;&gt;&gt; &#010;&gt;&gt; I know the code paths fairly well, but...&#010;&gt;&gt; Does there exists a path in the code whereby user data gets interpreted, which could&#010;be exploited to perform user operations?&#010;&gt;&gt; &#010;&gt;&gt; From the Thrift side of things, I've always felt safe.  Data is opaque.  Serializers&#010;are used to convert it to Bytes, and C* doesn't ever really do anything with the data.&#010;&gt;&gt; &#010;&gt;&gt; In examining the CQL java-driver, it looks like there might be a bit more exposure&#010;to injection.  (or even CQL over Thrift)  I haven't dug into the code yet, but dependent on&#010;which flavor of the API you are using, you may be including user data in your statements.&#010; &#010;&gt;&gt; &#010;&gt;&gt; Does anyone know if the CQL java-driver does anything to protect against injection?&#010; Or is it possible to say that the syntax is strict enough that any embedded operations in&#010;data would not parse?&#010;&gt;&gt; &#010;&gt;&gt; just some food for thought...&#010;&gt;&gt; I'll be digging into this over the next couple weeks.  If people are interested,&#010;I can throw a blog post out there with the findings.&#010;&gt;&gt; &#010;&gt;&gt; -brian&#010;&gt;&gt; &#010;&gt;&gt; -- &#010;&gt;&gt; Brian ONeill&#010;&gt;&gt; Lead Architect, Health Market Science (http://healthmarketscience.com)&#010;&gt;&gt; mobile:215.588.6024&#010;&gt;&gt; blog: http://brianoneill.blogspot.com/&#010;&gt;&gt; twitter: @boneill42&#010;&gt; &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Get fragments of big files (videos)</title>
<author><name>Simon Majou &lt;simon@majou.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEoZr1nRVx-vK_iC-Ahi+ZnnZOssagzzLwFy_ziByxHmu1u10A@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEoZr1nRVx-vK_iC-Ahi+ZnnZOssagzzLwFy_ziByxHmu1u10A@mail-gmail-com%3e</id>
<updated>2013-06-20T08:54:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks Serge&#010;&#010;Simon&#010;&#010;&#010;On Thu, Jun 20, 2013 at 10:48 AM, Serge Fonville&#010;&lt;serge.fonville@gmail.com&gt; wrote:&#010;&gt; Also, after a quick Google.&#010;&gt;&#010;&gt; http://wiki.apache.org/cassandra/CassandraLimitations states values cannot&#010;&gt; exceed 2GB, it also answers you offset question&#010;&gt;&#010;&gt; HTH&#010;&gt; Kind regards/met vriendelijke groet,&#010;&gt;&#010;&gt; Serge Fonville&#010;&gt;&#010;&gt; http://www.sergefonville.nl&#010;&gt;&#010;&gt; Convince Microsoft!&#010;&gt; They need to add TRUNCATE PARTITION in SQL Server&#010;&gt; https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table&#010;&gt;&#010;&gt;&#010;&gt; 2013/6/20 Sachin Sinha &lt;sinha.sachin@gmail.com&gt;&#010;&gt;&gt;&#010;&gt;&gt; Fragment them in rows, that will help.&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; On 20 June 2013 09:43, Simon Majou &lt;simon@majou.org&gt; wrote:&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Hello,&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; If I store a video into a column, how can I get a fragment of it&#010;&gt;&gt;&gt; without having to download it entirely ? Is there a way to give an&#010;&gt;&gt;&gt; offset on a column ?&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Do I have to fragment it over a lot of small fixed sizes columns ? Is&#010;&gt;&gt;&gt; there any disadvantage to do so ? For example fragment a 10GB file&#010;&gt;&gt;&gt; into 1 000 columns of 10 MB ?&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Simon&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Get fragments of big files (videos)</title>
<author><name>Serge Fonville &lt;serge.fonville@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAOAS_+KLpgLhA0ba16=MD-0m8hFLedycx9UTv_q7yYfYfp6MvA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAOAS_+KLpgLhA0ba16=MD-0m8hFLedycx9UTv_q7yYfYfp6MvA@mail-gmail-com%3e</id>
<updated>2013-06-20T08:48:53Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Also, after a quick Google.&#010;&#010;http://wiki.apache.org/cassandra/CassandraLimitations states values cannot&#010;exceed 2GB, it also answers you offset question&#010;&#010;HTH&#010;Kind regards/met vriendelijke groet,&#010;&#010;Serge Fonville&#010;&#010;http://www.sergefonville.nl&#010;&#010;Convince Microsoft!&#010;They need to add TRUNCATE PARTITION in SQL Server&#010;https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table&#010;&#010;&#010;2013/6/20 Sachin Sinha &lt;sinha.sachin@gmail.com&gt;&#010;&#010;&gt; Fragment them in rows, that will help.&#010;&gt;&#010;&gt;&#010;&gt; On 20 June 2013 09:43, Simon Majou &lt;simon@majou.org&gt; wrote:&#010;&gt;&#010;&gt;&gt; Hello,&#010;&gt;&gt;&#010;&gt;&gt; If I store a video into a column, how can I get a fragment of it&#010;&gt;&gt; without having to download it entirely ? Is there a way to give an&#010;&gt;&gt; offset on a column ?&#010;&gt;&gt;&#010;&gt;&gt; Do I have to fragment it over a lot of small fixed sizes columns ? Is&#010;&gt;&gt; there any disadvantage to do so ? For example fragment a 10GB file&#010;&gt;&gt; into 1 000 columns of 10 MB ?&#010;&gt;&gt;&#010;&gt;&gt; Simon&#010;&gt;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Get fragments of big files (videos)</title>
<author><name>Sachin Sinha &lt;sinha.sachin@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAKwXaRq_LaZLSofa2LePSNM6V-xiQAMpUFPh5S+zBjWxsUWhOA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAKwXaRq_LaZLSofa2LePSNM6V-xiQAMpUFPh5S+zBjWxsUWhOA@mail-gmail-com%3e</id>
<updated>2013-06-20T08:46:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Fragment them in rows, that will help.&#010;&#010;&#010;On 20 June 2013 09:43, Simon Majou &lt;simon@majou.org&gt; wrote:&#010;&#010;&gt; Hello,&#010;&gt;&#010;&gt; If I store a video into a column, how can I get a fragment of it&#010;&gt; without having to download it entirely ? Is there a way to give an&#010;&gt; offset on a column ?&#010;&gt;&#010;&gt; Do I have to fragment it over a lot of small fixed sizes columns ? Is&#010;&gt; there any disadvantage to do so ? For example fragment a 10GB file&#010;&gt; into 1 000 columns of 10 MB ?&#010;&gt;&#010;&gt; Simon&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Get fragments of big files (videos)</title>
<author><name>Simon Majou &lt;simon@majou.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEoZr1mFrGkz6-DBB6NoeaD_K9tU7+eGqbwn7njJzwV+2zDh-Q@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEoZr1mFrGkz6-DBB6NoeaD_K9tU7+eGqbwn7njJzwV+2zDh-Q@mail-gmail-com%3e</id>
<updated>2013-06-20T08:43:40Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,&#010;&#010;If I store a video into a column, how can I get a fragment of it&#010;without having to download it entirely ? Is there a way to give an&#010;offset on a column ?&#010;&#010;Do I have to fragment it over a lot of small fixed sizes columns ? Is&#010;there any disadvantage to do so ? For example fragment a 10GB file&#010;into 1 000 columns of 10 MB ?&#010;&#010;Simon&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Reduce Cassandra GC</title>
<author><name>Joel Samuelsson &lt;samuelsson.joel@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEMs6zQn9yC0DaU0tma-92zhsHB_D381jWs-P_VXTnVRQ19NVw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEMs6zQn9yC0DaU0tma-92zhsHB_D381jWs-P_VXTnVRQ19NVw@mail-gmail-com%3e</id>
<updated>2013-06-20T07:27:23Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
12.3 GB data per node (only one ńode).&#010;16GB RAM.&#010;In virtual environment with the CPU specified as "8 cores", average CPU use&#010;is close to 0% (basically no load, around 12 requests / sec, mostly from&#010;OpsCenter).&#010;Average memory use is 4GB. Around 1GB heap used by Cassandra (out of 4GB).&#010;&#010;&#010;2013/6/19 Mohit Anchlia &lt;mohitanchlia@gmail.com&gt;&#010;&#010;&gt; How much data do you have per node?&#010;&gt; How much RAM per node?&#010;&gt; How much CPU per node?&#010;&gt; What is the avg CPU and memory usage?&#010;&gt;&#010;&gt; On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson &lt;&#010;&gt; samuelsson.joel@gmail.com&gt; wrote:&#010;&gt;&#010;&gt;&gt;  My Cassandra ps info:&#010;&gt;&gt;&#010;&gt;&gt; root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user&#010;&gt;&gt; cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile&#010;&gt;&gt; /var/run/cassandra.pid -errfile &amp;1 -outfile /var/log/cassandra/output.log&#010;&gt;&gt; -cp&#010;&gt;&gt; /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar&#010;&gt;&gt; -Dlog4j.configuration=log4j-server.properties&#010;&gt;&gt; -Dlog4j.defaultInitOverride=true&#010;&gt;&gt; -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof&#010;&gt;&gt; -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea&#010;&gt;&gt; -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities&#010;&gt;&gt; -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M&#010;&gt;&gt; -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC&#010;&gt;&gt; -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8&#010;&gt;&gt; -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75&#010;&gt;&gt; -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB&#010;&gt;&gt; -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199&#010;&gt;&gt; -Dcom.sun.management.jmxremote.ssl=false&#010;&gt;&gt; -Dcom.sun.management.jmxremote.authenticate=false&#010;&gt;&gt; org.apache.cassandra.service.CassandraDaemon&#010;&gt;&gt; 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc&#010;&gt;&gt; -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile&#010;&gt;&gt; /var/run/cassandra.pid -errfile &amp;1 -outfile /var/log/cassandra/output.log&#010;&gt;&gt; -cp&#010;&gt;&gt; /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar&#010;&gt;&gt; -Dlog4j.configuration=log4j-server.properties&#010;&gt;&gt; -Dlog4j.defaultInitOverride=true&#010;&gt;&gt; -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof&#010;&gt;&gt; -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea&#010;&gt;&gt; -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities&#010;&gt;&gt; -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M&#010;&gt;&gt; -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC&#010;&gt;&gt; -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8&#010;&gt;&gt; -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75&#010;&gt;&gt; -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB&#010;&gt;&gt; -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199&#010;&gt;&gt; -Dcom.sun.management.jmxremote.ssl=false&#010;&gt;&gt; -Dcom.sun.management.jmxremote.authenticate=false&#010;&gt;&gt; org.apache.cassandra.service.CassandraDaemon&#010;&gt;&gt;&#010;&gt;&gt; Is it normal to have two processes like this?&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>[Cassandra] Running node tool cleanup</title>
<author><name>Emalayan Vairavanathan &lt;svemalayan@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c1371711690.71597.YahooMailNeo@web162002.mail.bf1.yahoo.com%3e"/>
<id>urn:uuid:%3c1371711690-71597-YahooMailNeo@web162002-mail-bf1-yahoo-com%3e</id>
<updated>2013-06-20T07:01:30Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi All,&#010;&#010;1) What will happen if I run nodetool cleanup immediately after bringing a new node up (i.e.&#010;before the key migration process is completed) ?&#010;&#010;        Will it cause some race conditions ? Or will it result in some part of the space&#010;never be reclaimed ?&#010;&#010;2) After adding a new machine, how can I make sure that the key migration is completed ? Should&#010;I run nodetool netstats on all the nodes ? Is there any better way ?&#010;&#010;Thank you&#010;Emalayan &#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Unit Testing Cassandra</title>
<author><name>Shahab Yunus &lt;shahab.yunus@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEo-6+S-NxCfzDJ1NxVGNdGVu+9dBS60nbf5xgRHmK5Hka1isw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEo-6+S-NxCfzDJ1NxVGNdGVu+9dBS60nbf5xgRHmK5Hka1isw@mail-gmail-com%3e</id>
<updated>2013-06-20T02:25:11Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks Edward, Ben and Dean for the pointers. Yes, I am using Java and&#010;these sounds promising for unit testing, at least.&#010;&#010;Regards,&#010;Shahab&#010;&#010;&#010;On Wed, Jun 19, 2013 at 9:58 AM, Edward Capriolo &lt;edlinuxguru@gmail.com&gt;wrote:&#010;&#010;&gt; You really do not need much in java you can use the embedded server.&#010;&gt; Hector wrap a simple class around thiscalled  EmbeddedServerHelper&#010;&gt;&#010;&gt;&#010;&gt; On Wednesday, June 19, 2013, Ben Boule &lt;Ben_Boule@rapid7.com&gt; wrote:&#010;&gt; &gt; Hi Shabab,&#010;&gt; &gt;&#010;&gt; &gt; Cassandra-Unit has been helpful for us for running unit tests without&#010;&gt; requiring a real cassandra instance to be running.   We only use this to&#010;&gt; test our "DAO" code which interacts with the Cassandra client.  It&#010;&gt; basically starts up an embedded instance of cassandra and fools your&#010;&gt; client/driver into using it.  It uses a non-standard port and you just need&#010;&gt; to make sure you can set the port as a parameter into your client code.&#010;&gt; &gt;&#010;&gt; &gt; https://github.com/jsevellec/cassandra-unit&#010;&gt; &gt;&#010;&gt; &gt; One important thing is to either clear out the keyspace in between tests&#010;&gt; or carefully separate your data so different tests don't collide with each&#010;&gt; other in the embedded database.&#010;&gt; &gt;&#010;&gt; &gt; Setup/tear down time is pretty reasonable.&#010;&gt; &gt;&#010;&gt; &gt; Ben&#010;&gt; &gt; ________________________________&#010;&gt; &gt; From: Shahab Yunus [shahab.yunus@gmail.com]&#010;&gt; &gt; Sent: Wednesday, June 19, 2013 8:46 AM&#010;&gt; &gt; To: user@cassandra.apache.org&#010;&gt; &gt; Subject: Re: Unit Testing Cassandra&#010;&gt; &gt;&#010;&gt; &gt; Thanks Stephen for you reply and explanation. My bad that I mixed those&#010;&gt; up and wasn't clear enough. Yes, I have different 2 requests/questions.&#010;&gt; &gt; 1) One is for the unit testing.&#010;&gt; &gt; 2) Second (in which I am more interested in) is for performance&#010;&gt; (stress/load) testing. Let us keep integration aside for now.&#010;&gt; &gt; I do see some stuff out there but wanted to know recommendations from&#010;&gt; the community given their experience.&#010;&gt; &gt; Regards,&#010;&gt; &gt; Shahab&#010;&gt; &gt;&#010;&gt; &gt; On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly &lt;&#010;&gt; stephen.alan.connolly@gmail.com&gt; wrote:&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; Unit testing means testing in isolation the smallest part.&#010;&gt; &gt;&gt; Unit tests should not take more than a few milliseconds to set up and&#010;&gt; verify their assertions.&#010;&gt; &gt;&gt; As such, if your code is not factored well for testing, you would&#010;&gt; typically use mocking (either by hand, or with mocking libraries) to mock&#010;&gt; out the bits not under test.&#010;&gt; &gt;&gt; Extensive use of mocks is usually a smell of code that is not well&#010;&gt; designed *for testing*&#010;&gt; &gt;&gt; If you intend to test components integrated together... That is&#010;&gt; integration testing.&#010;&gt; &gt;&gt; If you intend to test performance of the whole or significant parts of&#010;&gt; the whole... That is performance testing.&#010;&gt; &gt;&gt; When searching for the above, you will not get much luck if you are&#010;&gt; looking for them in the context of "unit testing" as those things are&#010;&gt; *outside the scope of unit testing"&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; On Wednesday, 19 June 2013, Shahab Yunus wrote:&#010;&gt; &gt;&gt;&gt;&#010;&gt; &gt;&gt;&gt; Hello,&#010;&gt; &gt;&gt;&gt;&#010;&gt; &gt;&gt;&gt; Can anyone suggest a good/popular Unit Test tools/frameworks/utilities&#010;&gt; out&#010;&gt; &gt;&gt;&gt; there for unit testing Cassandra stores? I am looking for testing from&#010;&gt; performance/load and monitoring perspective. I am using 1.2.&#010;&gt; &gt;&gt;&gt;&#010;&gt; &gt;&gt;&gt; Thanks a lot.&#010;&gt; &gt;&gt;&gt;&#010;&gt; &gt;&gt;&gt; Regards,&#010;&gt; &gt;&gt;&gt; Shahab&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; --&#010;&gt; &gt;&gt; Sent from my phone&#010;&gt; &gt;&#010;&gt; &gt; This electronic message contains information which may be confidential&#010;&gt; or privileged. The information is intended for the use of the individual or&#010;&gt; entity named above. If you are not the intended recipient, be aware that&#010;&gt; any disclosure, copying, distribution or use of the contents of this&#010;&gt; information is prohibited. If you have received this electronic&#010;&gt; transmission in error, please notify us by e-mail at (&#010;&gt; postmaster@rapid7.com) immediately.&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Data not fully replicated with 2 nodes and replication factor 2</title>
<author><name>Wei Zhu &lt;wz1975@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c1371684924.29557.GenericBBA@web160905.mail.bf1.yahoo.com%3e"/>
<id>urn:uuid:%3c1371684924-29557-GenericBBA@web160905-mail-bf1-yahoo-com%3e</id>
<updated>2013-06-19T23:35:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Rob, &#010;Thanks. &#010;I was not aware of that. So we can avoid repair if there is no hardware failure...I found&#010;a blog: &#010;&#010;http://www.datastax.com/dev/blog/modern-hinted-handoff &#010;&#010;-Wei &#010;&#010;----- Original Message -----&#010;&#010;From: "Robert Coli" &lt;rcoli@eventbrite.com&gt; &#010;To: user@cassandra.apache.org, "Wei Zhu" &lt;wz1975@yahoo.com&gt; &#010;Sent: Wednesday, June 19, 2013 12:58:45 PM &#010;Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 &#010;&#010;On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu &lt;wz1975@yahoo.com&gt; wrote: &#010;&gt; I think hints are only stored when the other node is down, not on the &#010;&gt; dropped mutations. (Correct me if I am wrong, actually it's not a bad idea &#010;&gt; to store hints for dropped mutations and replay them later?) &#010;&#010;This used to be the way it worked pre-1.0... &#010;&#010;https://issues.apache.org/jira/browse/CASSANDRA-2034 &#010;&#010;In modern cassandra, anything but a successful ack from a coordinated &#010;write results in a hint on the coordinator. &#010;&#010;&gt; To solve your issue, as I mentioned, either do nodetool repair, or increase &#010;&gt; your consistency level. By the way, you probably write faster than your &#010;&gt; cluster can handle if you see that many dropped mutations. &#010;&#010;If his hints are ultimately delivered, OP should not "need" repair to &#010;be consistent. &#010;&#010;=Rob &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Performance Difference between Cassandra version</title>
<author><name>Franc Carter &lt;franc.carter@sirca.org.au&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAB5AukF0cZQivE-geV+NQqRqvrGCjC+c-ZNT8+kCqRuACimnDw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAB5AukF0cZQivE-geV+NQqRqvrGCjC+c-ZNT8+kCqRuACimnDw@mail-gmail-com%3e</id>
<updated>2013-06-19T23:24:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal &lt;jamalraihan@gmail.com&gt; wrote:&#010;&#010;&gt; I am trying to see whether there will be any performance difference&#010;&gt; between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?&#010;&gt;&#010;&gt; Has anyone seen any major performance difference?&#010;&gt;&#010;&#010;We are part way through a performance comparison between 1.0.9 with Size&#010;Tiered Compaction and 1.2.4 with Leveled Compaction - for our use case it&#010;looks like a significant performance improvement on the read side.  We are&#010;finding compaction lags when we do very large bulk loads, but for us this&#010;is an initialisation task and that's a reasonable trade-off&#010;&#010;cheers&#010;&#010;-- &#010;&#010;*Franc Carter* | Systems architect | Sirca Ltd&#010; &lt;marc.zianideferranti@sirca.org.au&gt;&#010;&#010;franc.carter@sirca.org.au | www.sirca.org.au&#010;&#010;Tel: +61 2 8355 2514&#010;&#010;Level 4, 55 Harrington St, The Rocks NSW 2000&#010;&#010;PO Box H58, Australia Square, Sydney NSW 1215&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Performance Difference between Cassandra version</title>
<author><name>Raihan Jamal &lt;jamalraihan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAH0jRU87d4-LW4ppzTS+rjWoKTaRx07jrGB8ZpBNoCkUNWUtZw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAH0jRU87d4-LW4ppzTS+rjWoKTaRx07jrGB8ZpBNoCkUNWUtZw@mail-gmail-com%3e</id>
<updated>2013-06-19T23:18:39Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I am trying to see whether there will be any performance difference between&#010;Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?&#010;&#010;Has anyone seen any major performance difference?&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>error on startup: unable to find sufficient sources for streaming range</title>
<author><name>Faraaz Sareshwala &lt;fsareshwala@quantcast.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c20130619223555.GA11344@quantcast.com%3e"/>
<id>urn:uuid:%3c20130619223555-GA11344@quantcast-com%3e</id>
<updated>2013-06-19T22:36:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;&#010;I couldn't find any information on the following error so I apologize if it has&#010;already been discussed.&#010;&#010;On some of my nodes, I'm getting the following exception when cassandra starts&#010;up:&#010;&#010;2013-06-19 22:17:39.480414500 Exception encountered during startup: unable to find sufficient&#010;sources for streaming range (-4250921392403750427,-4250887922781325324]&#010;2013-06-19 22:17:39.482733500 ERROR Exception in thread Thread[StorageServiceShutdownHook,5,main]&#010;(CassandraDaemon.java:org.apache.cassandra.service.CassandraDaemon$1:175)&#010;2013-06-19 22:17:39.482735500 java.lang.NullPointerException&#010;2013-06-19 22:17:39.482735500   at org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)&#010;2013-06-19 22:17:39.482736500   at org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:362)&#010;2013-06-19 22:17:39.482736500   at org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)&#010;2013-06-19 22:17:39.482751500   at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:513)&#010;2013-06-19 22:17:39.482752500   at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)&#010;2013-06-19 22:17:39.482752500   at java.lang.Thread.run(Thread.java:662)&#010;&#010;Can someone point me to more information about what could cause this error?&#010;&#010;Faraaz&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Data not fully replicated with 2 nodes and replication factor 2</title>
<author><name>Robert Coli &lt;rcoli@eventbrite.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEDUwd0jFmz+xmg6raMniHm9_f-2+ndWAjqi2Ge5n6xw-c5QfA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEDUwd0jFmz+xmg6raMniHm9_f-2+ndWAjqi2Ge5n6xw-c5QfA@mail-gmail-com%3e</id>
<updated>2013-06-19T19:58:45Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu &lt;wz1975@yahoo.com&gt; wrote:&#010;&gt; I think hints are only stored when the other node is down, not on the&#010;&gt; dropped mutations. (Correct me if I am wrong, actually it's not a bad idea&#010;&gt; to store hints for dropped mutations and replay them later?)&#010;&#010;This used to be the way it worked pre-1.0...&#010;&#010;https://issues.apache.org/jira/browse/CASSANDRA-2034&#010;&#010;In modern cassandra, anything but a successful ack from a coordinated&#010;write results in a hint on the coordinator.&#010;&#010;&gt; To solve your issue, as I mentioned, either do nodetool repair, or increase&#010;&gt; your consistency level.  By the way, you probably write faster than your&#010;&gt; cluster can handle if you see that many dropped mutations.&#010;&#010;If his hints are ultimately delivered, OP should not "need" repair to&#010;be consistent.&#010;&#010;=Rob&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Date range queries</title>
<author><name>David McNelis &lt;dmcnelis@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCACy0uxkXQuVqB=511VKc2ndZYmbx-BF8qPUZ3DFBaNY7V_HZGA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCACy0uxkXQuVqB=511VKc2ndZYmbx-BF8qPUZ3DFBaNY7V_HZGA@mail-gmail-com%3e</id>
<updated>2013-06-19T19:28:57Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
So, if you want to grab by the created_at and occasionally limit by&#010;question id, that is why you'd use created_at.&#010;&#010;The way the primary keys work is the first part of the primary key is the&#010;Partioner key, that field is what essentially is the single cassandra row.&#010; The second key is the order preserving key, so you can sort by that key.&#010; If you have a third piece, then that is the secondary order preserving key.&#010;&#010;The reason you'd want to do (user_id, created_at, question_id) is because&#010;when you do a query on the keys, if you MUST use the preceding pieces of&#010;the primary key.  So in your case, you could not do a query with just&#010;user_id and question_id with the user-created-question key.  Alternatively&#010;if you went with (user_id, question_id, created_at), you would not be able&#010;to include a range of created_at unless you were also filtering on the&#010;question_id.&#010;&#010;Does that make sense?&#010;&#010;As for the large rows, 10k is unlikely to cause you too many issues (unless&#010;the answer is potentially a big blob of text).  Newer versions of cassandra&#010;deal with a lot of things in far, far, superior ways to &lt; 1.0.&#010;&#010;For a really good primary on keys in cql and how to potentially avoid hot&#010;rows, a really good article to read is this one:&#010;http://thelastpickle.com/2013/01/11/primary-keys-in-cql/  Aaron did a great&#010;job of laying out the subtleties of primary keys in CQL.&#010;&#010;&#010;On Wed, Jun 19, 2013 at 2:21 PM, Christopher J. Bottaro &lt;&#010;cjbottaro@academicworks.com&gt; wrote:&#010;&#010;&gt; Interesting, thank you for the reply.&#010;&gt;&#010;&gt; Two questions though...&#010;&gt;&#010;&gt; Why should created_at come before question_id in the primary key?  In&#010;&gt; other words, why (user_id, created_at, question_id) instead of (user_id,&#010;&gt; question_id, created_at)?&#010;&gt;&#010;&gt; Given this setup, all a user's answers (all 10k) will be stored in a&#010;&gt; single C* (internal, not cql) row?  I thought having "fat" or "big" rows&#010;&gt; was bad.  I worked with Cassandra 0.6 at my previous job and given the&#010;&gt; nature of our work, we would sometimes generate these "fat" rows... at&#010;&gt; which point Cassandra would basically shit the bed.&#010;&gt;&#010;&gt; Thanks for the help.&#010;&gt;&#010;&gt;&#010;&gt; On Wed, Jun 19, 2013 at 12:26 PM, David McNelis &lt;dmcnelis@gmail.com&gt;wrote:&#010;&gt;&#010;&gt;&gt; I think you'd just be better served with just a little different primary&#010;&gt;&gt; key.&#010;&gt;&gt;&#010;&gt;&gt; If your primary key was (user_id, created_at)  or (user_id, created_at,&#010;&gt;&gt; question_id), then you'd be able to run the above query without a problem.&#010;&gt;&gt;&#010;&gt;&gt; This will mean that the entire pantheon of a specific user_id will be&#010;&gt;&gt; stored as a 'row' (in the old style C* vernacular), and then the&#010;&gt;&gt; information would be ordered by the 2nd piece of the primary key (or 2nd,&#010;&gt;&gt; then 3rd if you included question_id).&#010;&gt;&gt;&#010;&gt;&gt; You would certainly want to include any field that makes a record unique&#010;&gt;&gt; in the primary key.  Another thing to note is that if a field is part of&#010;&gt;&gt; the primary key you can not create a secondary index on that field.  You&#010;&gt;&gt; can work around that by storing the field twice, but you might want to&#010;&gt;&gt; rethink your structure if you find yourself doing that often.&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro &lt;&#010;&gt;&gt; cjbottaro@academicworks.com&gt; wrote:&#010;&gt;&gt;&#010;&gt;&gt;&gt; Hello,&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; We are considering using Cassandra and I want to make sure our use case&#010;&gt;&gt;&gt; fits Cassandra's strengths.  We have the table like:&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; answers&#010;&gt;&gt;&gt; -------&#010;&gt;&gt;&gt; user_id | question_id | result | created_at&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Where our most common query will be something like:&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; SELECT * FROM answers WHERE user_id = 123 AND created_at &gt; '01/01/2012'&#010;&gt;&gt;&gt; AND created_at &lt; '01/01/2013'&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Sometimes we will also limit by a question_id or a list of question_ids.&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Secondary indexes will be created on user_id and question_id.  We expect&#010;&gt;&gt;&gt; the upper bound of number of answers for a given user to be around 10,000.&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Now my understanding of how Cassandra will run the aforementioned query&#010;&gt;&gt;&gt; is that it will load all the answers for a given user into memory using the&#010;&gt;&gt;&gt; secondary index, then scan over that set filtering based on the dates.&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Considering that that will be our most used query and it will happen&#010;&gt;&gt;&gt; very often, is this a bad use case for Cassandra?&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Thanks for the help.&#010;&gt;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Date range queries</title>
<author><name>&quot;Christopher J. Bottaro&quot; &lt;cjbottaro@academicworks.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAAw6nKsTBxaY=Ae0-j3W6bTE+gOszJibJr6+MghRzP8KFxcfsg@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAAw6nKsTBxaY=Ae0-j3W6bTE+gOszJibJr6+MghRzP8KFxcfsg@mail-gmail-com%3e</id>
<updated>2013-06-19T19:21:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Interesting, thank you for the reply.&#010;&#010;Two questions though...&#010;&#010;Why should created_at come before question_id in the primary key?  In other&#010;words, why (user_id, created_at, question_id) instead of (user_id,&#010;question_id, created_at)?&#010;&#010;Given this setup, all a user's answers (all 10k) will be stored in a single&#010;C* (internal, not cql) row?  I thought having "fat" or "big" rows was bad.&#010; I worked with Cassandra 0.6 at my previous job and given the nature of our&#010;work, we would sometimes generate these "fat" rows... at which point&#010;Cassandra would basically shit the bed.&#010;&#010;Thanks for the help.&#010;&#010;&#010;On Wed, Jun 19, 2013 at 12:26 PM, David McNelis &lt;dmcnelis@gmail.com&gt; wrote:&#010;&#010;&gt; I think you'd just be better served with just a little different primary&#010;&gt; key.&#010;&gt;&#010;&gt; If your primary key was (user_id, created_at)  or (user_id, created_at,&#010;&gt; question_id), then you'd be able to run the above query without a problem.&#010;&gt;&#010;&gt; This will mean that the entire pantheon of a specific user_id will be&#010;&gt; stored as a 'row' (in the old style C* vernacular), and then the&#010;&gt; information would be ordered by the 2nd piece of the primary key (or 2nd,&#010;&gt; then 3rd if you included question_id).&#010;&gt;&#010;&gt; You would certainly want to include any field that makes a record unique&#010;&gt; in the primary key.  Another thing to note is that if a field is part of&#010;&gt; the primary key you can not create a secondary index on that field.  You&#010;&gt; can work around that by storing the field twice, but you might want to&#010;&gt; rethink your structure if you find yourself doing that often.&#010;&gt;&#010;&gt;&#010;&gt; On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro &lt;&#010;&gt; cjbottaro@academicworks.com&gt; wrote:&#010;&gt;&#010;&gt;&gt; Hello,&#010;&gt;&gt;&#010;&gt;&gt; We are considering using Cassandra and I want to make sure our use case&#010;&gt;&gt; fits Cassandra's strengths.  We have the table like:&#010;&gt;&gt;&#010;&gt;&gt; answers&#010;&gt;&gt; -------&#010;&gt;&gt; user_id | question_id | result | created_at&#010;&gt;&gt;&#010;&gt;&gt; Where our most common query will be something like:&#010;&gt;&gt;&#010;&gt;&gt; SELECT * FROM answers WHERE user_id = 123 AND created_at &gt; '01/01/2012'&#010;&gt;&gt; AND created_at &lt; '01/01/2013'&#010;&gt;&gt;&#010;&gt;&gt; Sometimes we will also limit by a question_id or a list of question_ids.&#010;&gt;&gt;&#010;&gt;&gt; Secondary indexes will be created on user_id and question_id.  We expect&#010;&gt;&gt; the upper bound of number of answers for a given user to be around 10,000.&#010;&gt;&gt;&#010;&gt;&gt; Now my understanding of how Cassandra will run the aforementioned query&#010;&gt;&gt; is that it will load all the answers for a given user into memory using the&#010;&gt;&gt; secondary index, then scan over that set filtering based on the dates.&#010;&gt;&gt;&#010;&gt;&gt; Considering that that will be our most used query and it will happen very&#010;&gt;&gt; often, is this a bad use case for Cassandra?&#010;&gt;&gt;&#010;&gt;&gt; Thanks for the help.&#010;&gt;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>Francisco Andrades Grassi &lt;bigjocker@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c66CC12CA-098F-4DCC-A101-FBB5C063A4CC@gmail.com%3e"/>
<id>urn:uuid:%3c66CC12CA-098F-4DCC-A101-FBB5C063A4CC@gmail-com%3e</id>
<updated>2013-06-19T19:16:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;&#010;I believe what he's recommending is:&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (counter, ts)&#010;)&#010;&#010;That way counter will be your partitioning key, and all the rows that have the same counter&#010;value will be clustered (stored as a single wide row sorted by the ts value). In this scenario&#010;the query:&#010;&#010; where counter = 'test' and ts &gt; minTimeuuid('2013-06-18 16:23:00') and ts &lt; minTimeuuid('2013-06-18&#010;16:24:00');&#010;&#010;would actually be a sequential read on a wide row on a single node.&#010;&#010;--&#010;Francisco Andrades Grassi&#010;www.bigjocker.com&#010;@bigjocker&#010;&#010;On Jun 19, 2013, at 12:17 PM, "Ryan, Brent" &lt;BRyan@cvent.com&gt; wrote:&#010;&#010;&gt; Tyler,&#010;&gt; &#010;&gt; You're recommending this schema instead, correct?&#010;&gt; &#010;&gt; CREATE TABLE count3 (&#010;&gt;   counter text,&#010;&gt;   ts timeuuid,&#010;&gt;   key1 text,&#010;&gt;   value int,&#010;&gt;   PRIMARY KEY (ts, counter)&#010;&gt; )&#010;&gt; &#010;&gt; I believe I tried this as well and ran into similar problems but I'll try it again. &#010;I'm using the "ByteOrderedPartitioner" if that helps with the latest version of DSE community&#010;edition which I believe is Cassandra 1.2.3.&#010;&gt; &#010;&gt; &#010;&gt; Thanks,&#010;&gt; Brent&#010;&gt; &#010;&gt; &#010;&gt; From: Tyler Hobbs &lt;tyler@datastax.com&gt;&#010;&gt; Reply-To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Date: Wednesday, June 19, 2013 11:00 AM&#010;&gt; To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Subject: Re: timeuuid and cql3 query&#010;&gt; &#010;&gt; &#010;&gt; On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&gt; wrote:&#010;&gt; &#010;&gt; CREATE TABLE count3 (&#010;&gt;   counter text,&#010;&gt;   ts timeuuid,&#010;&gt;   key1 text,&#010;&gt;   value int,&#010;&gt;   PRIMARY KEY ((counter, ts))&#010;&gt; )&#010;&gt; &#010;&gt; Instead of doing a composite partition key, remove a set of parens and let ts be your&#010;clustering key.  That will cause cql rows to be stored in sorted order by the ts column (for&#010;a given value of "counter") and allow you to do the kind of query you're looking for.&#010;&gt; &#010;&gt; &#010;&gt; -- &#010;&gt; Tyler Hobbs&#010;&gt; DataStax&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Data not fully replicated with 2 nodes and replication factor 2</title>
<author><name>Wei Zhu &lt;wz1975@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c1371667382.72724.GenericBBA@web160904.mail.bf1.yahoo.com%3e"/>
<id>urn:uuid:%3c1371667382-72724-GenericBBA@web160904-mail-bf1-yahoo-com%3e</id>
<updated>2013-06-19T18:43:02Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You have a lot of Dropped Mutations which means those writes might not go through. Since you&#010;have CL.ONE as write consistency, your client doesn't see the exception if write fails only&#010;on one node. &#010;I think hints are only stored when the other node is down, not on the dropped mutations. (Correct&#010;me if I am wrong, actually it's not a bad idea to store hints for dropped mutations and replay&#010;them later?) &#010;&#010;To solve your issue, as I mentioned, either do nodetool repair, or increase your consistency&#010;level. By the way, you probably write faster than your cluster can handle if you see that&#010;many dropped mutations. &#010;&#010;-Wei &#010;&#010;----- Original Message -----&#010;&#010;From: "James Lee" &lt;James.Lee@metaswitch.com&gt; &#010;To: user@cassandra.apache.org &#010;Sent: Wednesday, June 19, 2013 2:22:39 AM &#010;Subject: RE: Data not fully replicated with 2 nodes and replication factor 2 &#010;&#010;The test tool I am using catches any exceptions on the original writes and resubmits the write&#010;request until it's successful (bailing out after 5 failures). So for each key Cassandra has&#010;reported a successful write. &#010;&#010;&#010;Nodetool says the following - I'm guessing the pending hinted handoff is the interesting bit?&#010;&#010;&#010;comet-mvs01:/dsc-cassandra-1.2.2# ./bin/nodetool tpstats &#010;Pool Name Active Pending Completed Blocked All time blocked &#010;ReadStage 0 0 35445 0 0 &#010;RequestResponseStage 0 0 1535171 0 0 &#010;MutationStage 0 0 3038941 0 0 &#010;ReadRepairStage 0 0 2695 0 0 &#010;ReplicateOnWriteStage 0 0 0 0 0 &#010;GossipStage 0 0 2898 0 0 &#010;AntiEntropyStage 0 0 0 0 0 &#010;MigrationStage 0 0 245 0 0 &#010;MemtablePostFlusher 0 0 1260 0 0 &#010;FlushWriter 0 0 633 0 212 &#010;MiscStage 0 0 0 0 0 &#010;commitlog_archiver 0 0 0 0 0 &#010;InternalResponseStage 0 0 0 0 0 &#010;HintedHandoff 1 1 0 0 0 &#010;&#010;Message type Dropped &#010;RANGE_SLICE 0 &#010;READ_REPAIR 0 &#010;BINARY 0 &#010;READ 0 &#010;MUTATION 60427 &#010;_TRACE 0 &#010;REQUEST_RESPONSE 0 &#010;&#010;&#010;Looking at the hints column family in the system keyspace, I see one row with a large number&#010;of columns. Presumably that along with the nodetool output above suggests there are hinted&#010;handoffs pending? How long should I expect these to remain for? &#010;&#010;Ah, actually now that I re-run the command it seems that nodetool now reports that hint as&#010;completed and there are no hints left in the system keyspace on either node. I'm still seeing&#010;failures to read the data I'm expecting though, as before. Note that I've run this with a&#010;smaller data set (2M rows, 1GB data total) for this latest test. &#010;&#010;Thanks, &#010;James &#010;&#010;&#010;-----Original Message----- &#010;From: Robert Coli [mailto:rcoli@eventbrite.com] &#010;Sent: 18 June 2013 19:45 &#010;To: user@cassandra.apache.org &#010;Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 &#010;&#010;On Tue, Jun 18, 2013 at 11:36 AM, Wei Zhu &lt;wz1975@yahoo.com&gt; wrote: &#010;&gt; Cassandra doesn't do async replication like HBase does.You can run &#010;&gt; nodetool repair to insure the consistency. &#010;&#010;While this answer is true, it is somewhat non-responsive to the OP. &#010;&#010;If the OP didn't see timeout exception, the theoretical worst case is that he should have&#010;hints stored for initially failed to replicate writes. His nodes should not be failing GC&#010;with a total data size of 5gb on an 8gb heap, so those hints should deliver quite quickly.&#010;After &#010;30 minutes those hints should certainly be delivered. &#010;&#010;@OP : do you see hints being stored? does nodetool tpstats indicate dropped messages? &#010;&#010;=Rob &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Joining distinct clusters with the same schema together</title>
<author><name>Eric Stevens &lt;mightye@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAORswtwziieYYkkbkdyEP5gWZJMHg5dkq1TvCUEqqDTOYgf2og@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAORswtwziieYYkkbkdyEP5gWZJMHg5dkq1TvCUEqqDTOYgf2og@mail-gmail-com%3e</id>
<updated>2013-06-19T18:36:45Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&gt;&#010;&gt; On its face my answer is "not... really"? What do you view yourself as&#010;&gt; getting with this technique versus using built in replication? As an&#010;&gt; example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM&#010;&gt; consistency level operations?&#010;&#010;&#010;Doing replication manually sounds like a recipe for the DC's eventually&#010;getting subtly out of sync with each other.  If a connection goes down&#010;between DC's, and you are taking data at both, how will you catch each&#010;other up?  C* already offers that resolution for you, and you'd have to&#010;work pretty hard to reproduce it for no obvious benefit that I can see.&#010;&#010;For minimum effort, definitely rely on Cassandra's well-tested codebase for&#010;this.&#010;&#010;&#010;&#010;&#010;On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli &lt;rcoli@eventbrite.com&gt; wrote:&#010;&#010;&gt; On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala&#010;&gt; &lt;fsareshwala@quantcast.com&gt; wrote:&#010;&gt; &gt; Each datacenter will have a cassandra cluster with a separate set of&#010;&gt; seeds&#010;&gt; &gt; specific to that datacenter. However, the cluster name will be the same.&#010;&gt; &gt;&#010;&gt; &gt; Question 1: is this enough to guarentee that the three datacenters will&#010;&gt; have&#010;&gt; &gt; distinct cassandra clusters as well? Or will one node in datacenter A&#010;&gt; still&#010;&gt; &gt; somehow be able to join datacenter B's ring.&#010;&gt;&#010;&gt; If they have network connectivity and the same cluster name, they are&#010;&gt; the same logical cluster. However if your nodes share tokens and you&#010;&gt; have auto_bootstrap=yes (the implicit default) the second node you&#010;&gt; attempt to start will refuse to start because you are trying to&#010;&gt; bootstrap it into the range of a live node.&#010;&gt;&#010;&gt; &gt; For now, we are planning on using our own relay mechanism to transfer&#010;&gt; &gt; data changes from one datacenter to another.&#010;&gt;&#010;&gt; Are you planning to use the streaming commitlog functionality for&#010;&gt; this? Not sure how you would capture all changes otherwise, except&#010;&gt; having your app just write the same thing to multiple places? Unless&#010;&gt; data timestamps are identical between clusters, otherwise identical&#010;&gt; data will not merge properly, as cassandra uses data timestamps to&#010;&gt; merge.&#010;&gt;&#010;&gt; &gt; Question 2: is this a sane strategy?&#010;&gt;&#010;&gt; On its face my answer is "not... really"? What do you view yourself as&#010;&gt; getting with this technique versus using built in replication? As an&#010;&gt; example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM&#010;&gt; consistency level operations?&#010;&gt;&#010;&gt; &gt; Question 3: eventually, we want to turn all these cassandra clusters&#010;&gt; into one&#010;&gt; &gt; large multi-datacenter cluster. What's the best practice to do this?&#010;&gt; Should I&#010;&gt; &gt; just add nodes from all datacenters to the list of seeds and let&#010;&gt; cassandra&#010;&gt; &gt; resolve differences? Is there another way I don't know about?&#010;&gt;&#010;&gt; If you are using NetworkTopologyStrategy and have the same cluster&#010;&gt; name for your isolated clusters, all you need to do is :&#010;&gt;&#010;&gt; 1) configure NTS to store replicas on a per-datacenter basis&#010;&gt; 2) ensure that your nodes are in different logical data centers (by&#010;&gt; default, all nodes are in DC1/rack1)&#010;&gt; 3) ensure that clusters are able to reach each other&#010;&gt; 4) ensure that tokens do not overlap between clusters (the common&#010;&gt; technique with manual token assignment is that each node gets a range&#010;&gt; which is off-by-one)&#010;&gt; 5) ensure that all nodes seed lists contain (recommended) 3 seeds from&#010;&gt; each DC&#010;&gt; 6) rolling restart (so the new seed list is picked up)&#010;&gt; 7) repair ("should" only be required if writes have not replicated via&#010;&gt; your out of band mechanism)&#010;&gt;&#010;&gt; Vnodes change the picture slightly because the chance of your clusters&#010;&gt; having conflicting tokens increases with the number of token ranges&#010;&gt; you have.&#010;&gt;&#010;&gt; =Rob&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Heap is not released and streaming hangs at 0%</title>
<author><name>Wei Zhu &lt;wz1975@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c1371666998.53260.GenericBBA@web160905.mail.bf1.yahoo.com%3e"/>
<id>urn:uuid:%3c1371666998-53260-GenericBBA@web160905-mail-bf1-yahoo-com%3e</id>
<updated>2013-06-19T18:36:38Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
If you want, you can try to force the GC through Jconsole. Memory-&gt;Perform GC. &#010;&#010;It theoretically triggers a full GC and when it will happen depends on the JVM &#010;&#010;-Wei &#010;&#010;----- Original Message -----&#010;&#010;From: "Robert Coli" &lt;rcoli@eventbrite.com&gt; &#010;To: user@cassandra.apache.org &#010;Sent: Tuesday, June 18, 2013 10:43:13 AM &#010;Subject: Re: Heap is not released and streaming hangs at 0% &#010;&#010;On Tue, Jun 18, 2013 at 10:33 AM, srmore &lt;comomore@gmail.com&gt; wrote: &#010;&gt; But then shouldn't JVM C G it eventually ? I can still see Cassandra alive &#010;&gt; and kicking but looks like the heap is locked up even after the traffic is &#010;&gt; long stopped. &#010;&#010;No, when GC system fails this hard it is often a permanent failure &#010;which requires a restart of the JVM. &#010;&#010;&gt; nodetool -h localhost flush didn't do much good. &#010;&#010;This adds support to the idea that your heap is too full, and not full &#010;of memtables. &#010;&#010;You could try nodetool -h localhost invalidatekeycache, but that &#010;probably will not free enough memory to help you. &#010;&#010;=Rob &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Joining distinct clusters with the same schema together</title>
<author><name>Robert Coli &lt;rcoli@eventbrite.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEDUwd0Mg7BhT9a4mfcv_2AQef7pueOWMbo38mbqP6-Cg1fqbQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEDUwd0Mg7BhT9a4mfcv_2AQef7pueOWMbo38mbqP6-Cg1fqbQ@mail-gmail-com%3e</id>
<updated>2013-06-19T18:27:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala&#010;&lt;fsareshwala@quantcast.com&gt; wrote:&#010;&gt; Each datacenter will have a cassandra cluster with a separate set of seeds&#010;&gt; specific to that datacenter. However, the cluster name will be the same.&#010;&gt;&#010;&gt; Question 1: is this enough to guarentee that the three datacenters will have&#010;&gt; distinct cassandra clusters as well? Or will one node in datacenter A still&#010;&gt; somehow be able to join datacenter B's ring.&#010;&#010;If they have network connectivity and the same cluster name, they are&#010;the same logical cluster. However if your nodes share tokens and you&#010;have auto_bootstrap=yes (the implicit default) the second node you&#010;attempt to start will refuse to start because you are trying to&#010;bootstrap it into the range of a live node.&#010;&#010;&gt; For now, we are planning on using our own relay mechanism to transfer&#010;&gt; data changes from one datacenter to another.&#010;&#010;Are you planning to use the streaming commitlog functionality for&#010;this? Not sure how you would capture all changes otherwise, except&#010;having your app just write the same thing to multiple places? Unless&#010;data timestamps are identical between clusters, otherwise identical&#010;data will not merge properly, as cassandra uses data timestamps to&#010;merge.&#010;&#010;&gt; Question 2: is this a sane strategy?&#010;&#010;On its face my answer is "not... really"? What do you view yourself as&#010;getting with this technique versus using built in replication? As an&#010;example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM&#010;consistency level operations?&#010;&#010;&gt; Question 3: eventually, we want to turn all these cassandra clusters into one&#010;&gt; large multi-datacenter cluster. What's the best practice to do this? Should I&#010;&gt; just add nodes from all datacenters to the list of seeds and let cassandra&#010;&gt; resolve differences? Is there another way I don't know about?&#010;&#010;If you are using NetworkTopologyStrategy and have the same cluster&#010;name for your isolated clusters, all you need to do is :&#010;&#010;1) configure NTS to store replicas on a per-datacenter basis&#010;2) ensure that your nodes are in different logical data centers (by&#010;default, all nodes are in DC1/rack1)&#010;3) ensure that clusters are able to reach each other&#010;4) ensure that tokens do not overlap between clusters (the common&#010;technique with manual token assignment is that each node gets a range&#010;which is off-by-one)&#010;5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC&#010;6) rolling restart (so the new seed list is picked up)&#010;7) repair ("should" only be required if writes have not replicated via&#010;your out of band mechanism)&#010;&#010;Vnodes change the picture slightly because the chance of your clusters&#010;having conflicting tokens increases with the number of token ranges&#010;you have.&#010;&#010;=Rob&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>Sylvain Lebresne &lt;sylvain@datastax.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAKkz8Q2no6oUCbWnVeoMn_YMxfH0nKpQvtYm55jmVWA2QWXSWw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAKkz8Q2no6oUCbWnVeoMn_YMxfH0nKpQvtYm55jmVWA2QWXSWw@mail-gmail-com%3e</id>
<updated>2013-06-19T18:15:29Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
So part of it is a bug, namely&#010;https://issues.apache.org/jira/browse/CASSANDRA-5666. In summary CQL3&#010;should not accept: ts &gt; minTimeuuid('2013-06-17 22:36:16') and ts &lt;&#010;minTimeuuid('2013-06-20 22:44:02'), because it does no know how to handle&#010;it properly. What it should support is token(ts) &gt;&#010;token(minTimeuuid('2013-06-17 22:36:16')) and token(ts) &lt;&#010;token(minTimeuuid('2013-06-20 22:44:02')). And that is different because&#010;the token always sort by bytes, and comparing timeuuid by bytes does not&#010;yield a time based ordering.&#010;&#010;Long story short, using non-equal condition on the partition key (i.e. the&#010;first part of your primary key) is generally not advised. Or to put it&#010;another way, the use of the byte ordering partitioner is discouraged. But&#010;if you still want to use the ordering partitioner and do range queries on&#010;the partition key, do not use a timeuuid, because the ordering that the&#010;partitioner enforce will not be one that is meaningful (due to the timeuuid&#010;layout).&#010;&#010;--&#010;Sylvain&#010;&#010;&#010;&#010;On Wed, Jun 19, 2013 at 7:04 PM, Ryan, Brent &lt;BRyan@cvent.com&gt; wrote:&#010;&#010;&gt;  Note that it seems to work when you structure your schema in this&#010;&gt; example below, BUT this is a problem because all of my data will wind up&#010;&gt; hitting a single node in my cassandra cluster because the partitioning key&#010;&gt; is "counter" and that isn't unique enough.  I was hoping that I wasn't&#010;&gt; going to need to build up my own "sharding" scheme as this blog talks about&#010;&gt; (http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra)&#010;&gt; because this becomes much harder for other clients to integrate with&#010;&gt; because they now need to know how my data is structured in order to get it&#010;&gt; out.&#010;&gt;&#010;&gt;  CREATE TABLE count5 (&#010;&gt;   counter text,&#010;&gt;   ts timeuuid,&#010;&gt;   key1 text,&#010;&gt;   value int,&#010;&gt;   PRIMARY KEY (counter, ts)&#010;&gt; ) WITH&#010;&gt;   bloom_filter_fp_chance=0.010000 AND&#010;&gt;   caching='KEYS_ONLY' AND&#010;&gt;   comment='' AND&#010;&gt;   dclocal_read_repair_chance=0.000000 AND&#010;&gt;   gc_grace_seconds=864000 AND&#010;&gt;   read_repair_chance=0.100000 AND&#010;&gt;   replicate_on_write='true' AND&#010;&gt;   populate_io_cache_on_flush='false' AND&#010;&gt;   compaction={'class': 'SizeTieredCompactionStrategy'} AND&#010;&gt;   compression={'sstable_compression': 'SnappyCompressor'};&#010;&gt;&#010;&gt;  cqlsh:Test&gt; select counter,dateof(ts),key1,value from count5 where&#010;&gt; counter = 'test' and ts &gt; minTimeuuid('2013-06-17 22:36:16') and ts &lt;&#010;&gt; minTimeuuid('2013-06-18 22:44:02');&#010;&gt;&#010;&gt;   counter | dateof(ts)               | key1 | value&#010;&gt; ---------+--------------------------+------+-------&#010;&gt;     test | 2013-06-18 22:43:53-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:54-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:55-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:56-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:59-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:00-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:01-0400 |    1 |     1&#010;&gt;&#010;&gt;  cqlsh:Test&gt; select counter,dateof(ts),key1,value from count5 where&#010;&gt; counter = 'test' and ts &gt; minTimeuuid('2013-06-17 22:36:16') and ts &lt;&#010;&gt; minTimeuuid('2013-06-20 22:44:02');&#010;&gt;&#010;&gt;   counter | dateof(ts)               | key1 | value&#010;&gt; ---------+--------------------------+------+-------&#010;&gt;     test | 2013-06-18 22:43:53-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:54-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:55-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:56-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:43:59-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:00-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:01-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:02-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:02-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:03-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:04-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:05-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:44:06-0400 |    1 |     1&#010;&gt;&#010;&gt;&#010;&gt;   From: &lt;Ryan&gt;, Brent Ryan &lt;bryan@cvent.com&gt;&#010;&gt; Reply-To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Date: Wednesday, June 19, 2013 12:56 PM&#010;&gt;&#010;&gt; To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Subject: Re: timeuuid and cql3 query&#010;&gt;&#010;&gt;   Here's an example of that not working:&#010;&gt;&#010;&gt;  cqlsh:Test&gt; desc table count4;&#010;&gt;&#010;&gt;  CREATE TABLE count4 (&#010;&gt;   ts timeuuid,&#010;&gt;   counter text,&#010;&gt;   key1 text,&#010;&gt;   value int,&#010;&gt;   PRIMARY KEY (ts, counter)&#010;&gt; ) WITH&#010;&gt;   bloom_filter_fp_chance=0.010000 AND&#010;&gt;   caching='KEYS_ONLY' AND&#010;&gt;   comment='' AND&#010;&gt;   dclocal_read_repair_chance=0.000000 AND&#010;&gt;   gc_grace_seconds=864000 AND&#010;&gt;   read_repair_chance=0.100000 AND&#010;&gt;   replicate_on_write='true' AND&#010;&gt;   populate_io_cache_on_flush='false' AND&#010;&gt;   compaction={'class': 'SizeTieredCompactionStrategy'} AND&#010;&gt;   compression={'sstable_compression': 'SnappyCompressor'};&#010;&gt;&#010;&gt;  cqlsh:Test&gt; select counter,dateof(ts),key1,value from count4;&#010;&gt;&#010;&gt;   counter | dateof(ts)               | key1 | value&#010;&gt; ---------+--------------------------+------+-------&#010;&gt;     test | 2013-06-18 22:36:16-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:19-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:19-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:20-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:20-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:21-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:21-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:22-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:22-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:23-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:23-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:25-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:27-0400 |    1 |     1&#010;&gt;     test | 2013-06-18 22:36:28-0400 |    1 |     1&#010;&gt;&#010;&gt;  cqlsh:Statistics&gt; select counter,dateof(ts),key1,value from count4 where&#010;&gt; ts &gt; minTimeuuid('2013-06-17 22:36:16') and ts &lt; minTimeuuid('2013-06-19&#010;&gt; 22:36:20');&#010;&gt; Bad Request: 2 Start key must sort before (or equal to) finish key in your&#010;&gt; partitioner!&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;  Any ideas?  Seems like a bug to me, right?&#010;&gt;&#010;&gt;  Brent&#010;&gt;&#010;&gt;   From: &lt;Ryan&gt;, Brent Ryan &lt;bryan@cvent.com&gt;&#010;&gt; Reply-To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Date: Wednesday, June 19, 2013 12:47 PM&#010;&gt; To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Subject: Re: timeuuid and cql3 query&#010;&gt;&#010;&gt;   Tyler,&#010;&gt;&#010;&gt;  You're recommending this schema instead, correct?&#010;&gt;&#010;&gt;  CREATE TABLE count3 (&#010;&gt;   counter text,&#010;&gt;   ts timeuuid,&#010;&gt;   key1 text,&#010;&gt;   value int,&#010;&gt;   PRIMARY KEY (ts, counter)&#010;&gt; )&#010;&gt;&#010;&gt;  I believe I tried this as well and ran into similar problems but I'll&#010;&gt; try it again.  I'm using the "ByteOrderedPartitioner" if that helps with&#010;&gt; the latest version of DSE community edition which I believe is Cassandra&#010;&gt; 1.2.3.&#010;&gt;&#010;&gt;&#010;&gt;  Thanks,&#010;&gt; Brent&#010;&gt;&#010;&gt;&#010;&gt;   From: Tyler Hobbs &lt;tyler@datastax.com&gt;&#010;&gt; Reply-To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Date: Wednesday, June 19, 2013 11:00 AM&#010;&gt; To: "user@cassandra.apache.org" &lt;user@cassandra.apache.org&gt;&#010;&gt; Subject: Re: timeuuid and cql3 query&#010;&gt;&#010;&gt;&#010;&gt; On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&gt; wrote:&#010;&gt;&#010;&gt;&gt;&#010;&gt;&gt;  CREATE TABLE count3 (&#010;&gt;&gt;   counter text,&#010;&gt;&gt;   ts timeuuid,&#010;&gt;&gt;   key1 text,&#010;&gt;&gt;   value int,&#010;&gt;&gt;   PRIMARY KEY ((counter, ts))&#010;&gt;&gt; )&#010;&gt;&gt;&#010;&gt;&#010;&gt; Instead of doing a composite partition key, remove a set of parens and let&#010;&gt; ts be your clustering key.  That will cause cql rows to be stored in sorted&#010;&gt; order by the ts column (for a given value of "counter") and allow you to do&#010;&gt; the kind of query you're looking for.&#010;&gt;&#010;&gt;&#010;&gt; --&#010;&gt; Tyler Hobbs&#010;&gt; DataStax &lt;http://datastax.com/&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Joining distinct clusters with the same schema together</title>
<author><name>Faraaz Sareshwala &lt;fsareshwala@quantcast.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c20130619175045.GA2194@quantcast.com%3e"/>
<id>urn:uuid:%3c20130619175045-GA2194@quantcast-com%3e</id>
<updated>2013-06-19T17:50:46Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
My company is planning on deploying cassandra to three separate datacenters.&#010;Each datacenter will have a cassandra cluster with a separate set of seeds&#010;specific to that datacenter. However, the cluster name will be the same.&#010;&#010;Question 1: is this enough to guarentee that the three datacenters will have&#010;distinct cassandra clusters as well? Or will one node in datacenter A still&#010;somehow be able to join datacenter B's ring.&#010;&#010;Cassandra has cross datacenter replication and we plan to use that in the&#010;future. For now, we are planning on using our own relay mechanism to transfer&#010;data changes from one datacenter to another. Each cassandra cluster in each&#010;datacenter will have the same keyspaces and column families with the same&#010;schema. Datacenter A will send mutations over this relay to datacenter B which&#010;will replay the mutation in cassandra.  Therefore, datacenter A's cassandra&#010;cluster will look identical to datacenter B's cassandra cluster, but not through&#010;the cross datacenter replication that cassandra offers.&#010;&#010;Question 2: is this a sane strategy? We're trying to make the smallest possible&#010;change when deploying cassandra. Our plan is to slowly move our infrastructure&#010;over to relying more on cassandra once we can assess how it behaves with our&#010;workload.&#010;&#010;Question 3: eventually, we want to turn all these cassandra clusters into one&#010;large multi-datacenter cluster. What's the best practice to do this? Should I&#010;just add nodes from all datacenters to the list of seeds and let cassandra&#010;resolve differences? Is there another way I don't know about?&#010;&#010;Thank you,&#010;Faraaz&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: nodetool ring showing different 'Load' size</title>
<author><name>Robert Coli &lt;rcoli@eventbrite.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAEDUwd2jaqT3LvXBG3WgBuN7+WODi3mnyyLnThShRw9cOgYegQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAEDUwd2jaqT3LvXBG3WgBuN7+WODi3mnyyLnThShRw9cOgYegQ@mail-gmail-com%3e</id>
<updated>2013-06-19T17:26:27Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski &lt;michalm@opera.com&gt; wrote:&#010;&gt; You can also perform a major compaction via nodetool compact (for&#010;&gt; SizeTieredCompaction), but - again - you really should not do it unless&#010;&gt; you're really sure what you do, as it compacts all the SSTables together,&#010;&gt; which is not something you might want to achieve in most of the cases.&#010;&#010;If you do that and discover you did not want to :&#010;&#010;https://github.com/pcmanus/cassandra/tree/sstable_split&#010;&#010;Will enable you to split your monolithic sstable back into smaller sstables.&#010;&#010;=Rob&#010;PS - @pcmanus, here's that reminder we discussed @ summit to merge&#010;this tool into upstream! :D&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Date range queries</title>
<author><name>David McNelis &lt;dmcnelis@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCACy0uxnHCAoALuXc+XXA_L=qwx+yMuMW_qWEsjxbWbOKcUR8SA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCACy0uxnHCAoALuXc+XXA_L=qwx+yMuMW_qWEsjxbWbOKcUR8SA@mail-gmail-com%3e</id>
<updated>2013-06-19T17:26:27Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I think you'd just be better served with just a little different primary&#010;key.&#010;&#010;If your primary key was (user_id, created_at)  or (user_id, created_at,&#010;question_id), then you'd be able to run the above query without a problem.&#010;&#010;This will mean that the entire pantheon of a specific user_id will be&#010;stored as a 'row' (in the old style C* vernacular), and then the&#010;information would be ordered by the 2nd piece of the primary key (or 2nd,&#010;then 3rd if you included question_id).&#010;&#010;You would certainly want to include any field that makes a record unique in&#010;the primary key.  Another thing to note is that if a field is part of the&#010;primary key you can not create a secondary index on that field.  You can&#010;work around that by storing the field twice, but you might want to rethink&#010;your structure if you find yourself doing that often.&#010;&#010;&#010;On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro &lt;&#010;cjbottaro@academicworks.com&gt; wrote:&#010;&#010;&gt; Hello,&#010;&gt;&#010;&gt; We are considering using Cassandra and I want to make sure our use case&#010;&gt; fits Cassandra's strengths.  We have the table like:&#010;&gt;&#010;&gt; answers&#010;&gt; -------&#010;&gt; user_id | question_id | result | created_at&#010;&gt;&#010;&gt; Where our most common query will be something like:&#010;&gt;&#010;&gt; SELECT * FROM answers WHERE user_id = 123 AND created_at &gt; '01/01/2012'&#010;&gt; AND created_at &lt; '01/01/2013'&#010;&gt;&#010;&gt; Sometimes we will also limit by a question_id or a list of question_ids.&#010;&gt;&#010;&gt; Secondary indexes will be created on user_id and question_id.  We expect&#010;&gt; the upper bound of number of answers for a given user to be around 10,000.&#010;&gt;&#010;&gt; Now my understanding of how Cassandra will run the aforementioned query is&#010;&gt; that it will load all the answers for a given user into memory using the&#010;&gt; secondary index, then scan over that set filtering based on the dates.&#010;&gt;&#010;&gt; Considering that that will be our most used query and it will happen very&#010;&gt; often, is this a bad use case for Cassandra?&#010;&gt;&#010;&gt; Thanks for the help.&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Date range queries</title>
<author><name>&quot;Christopher J. Bottaro&quot; &lt;cjbottaro@academicworks.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAAw6nKsqhMs0cOJyocAyOci6KRdhER9W6i3Sx8JEet_En==inw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAAw6nKsqhMs0cOJyocAyOci6KRdhER9W6i3Sx8JEet_En==inw@mail-gmail-com%3e</id>
<updated>2013-06-19T17:05:36Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,&#010;&#010;We are considering using Cassandra and I want to make sure our use case&#010;fits Cassandra's strengths.  We have the table like:&#010;&#010;answers&#010;-------&#010;user_id | question_id | result | created_at&#010;&#010;Where our most common query will be something like:&#010;&#010;SELECT * FROM answers WHERE user_id = 123 AND created_at &gt; '01/01/2012' AND&#010;created_at &lt; '01/01/2013'&#010;&#010;Sometimes we will also limit by a question_id or a list of question_ids.&#010;&#010;Secondary indexes will be created on user_id and question_id.  We expect&#010;the upper bound of number of answers for a given user to be around 10,000.&#010;&#010;Now my understanding of how Cassandra will run the aforementioned query is&#010;that it will load all the answers for a given user into memory using the&#010;secondary index, then scan over that set filtering based on the dates.&#010;&#010;Considering that that will be our most used query and it will happen very&#010;often, is this a bad use case for Cassandra?&#010;&#010;Thanks for the help.&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>&quot;Ryan, Brent&quot; &lt;BRyan@cvent.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCDE75C0A.2B7F%25bryan@cvent.com%3e"/>
<id>urn:uuid:%3cCDE75C0A-2B7F%25bryan@cvent-com%3e</id>
<updated>2013-06-19T17:04:47Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Note that it seems to work when you structure your schema in this example below, BUT this is&#010;a problem because all of my data will wind up hitting a single node in my cassandra cluster&#010;because the partitioning key is "counter" and that isn't unique enough.  I was hoping that&#010;I wasn't going to need to build up my own "sharding" scheme as this blog talks about (http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra)&#010;because this becomes much harder for other clients to integrate with because they now need&#010;to know how my data is structured in order to get it out.&#010;&#010;CREATE TABLE count5 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (counter, ts)&#010;) WITH&#010;  bloom_filter_fp_chance=0.010000 AND&#010;  caching='KEYS_ONLY' AND&#010;  comment='' AND&#010;  dclocal_read_repair_chance=0.000000 AND&#010;  gc_grace_seconds=864000 AND&#010;  read_repair_chance=0.100000 AND&#010;  replicate_on_write='true' AND&#010;  populate_io_cache_on_flush='false' AND&#010;  compaction={'class': 'SizeTieredCompactionStrategy'} AND&#010;  compression={'sstable_compression': 'SnappyCompressor'};&#010;&#010;cqlsh:Test&gt; select counter,dateof(ts),key1,value from count5 where counter = 'test' and&#010;ts &gt; minTimeuuid('2013-06-17 22:36:16') and ts &lt; minTimeuuid('2013-06-18 22:44:02');&#010;&#010; counter | dateof(ts)               | key1 | value&#010;---------+--------------------------+------+-------&#010;    test | 2013-06-18 22:43:53-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:54-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:55-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:56-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:59-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:00-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:01-0400 |    1 |     1&#010;&#010;cqlsh:Test&gt; select counter,dateof(ts),key1,value from count5 where counter = 'test' and&#010;ts &gt; minTimeuuid('2013-06-17 22:36:16') and ts &lt; minTimeuuid('2013-06-20 22:44:02');&#010;&#010; counter | dateof(ts)               | key1 | value&#010;---------+--------------------------+------+-------&#010;    test | 2013-06-18 22:43:53-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:54-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:55-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:56-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:58-0400 |    1 |     1&#010;    test | 2013-06-18 22:43:59-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:00-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:01-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:02-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:02-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:03-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:04-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:05-0400 |    1 |     1&#010;    test | 2013-06-18 22:44:06-0400 |    1 |     1&#010;&#010;&#010;From: &lt;Ryan&gt;, Brent Ryan &lt;bryan@cvent.com&lt;mailto:bryan@cvent.com&gt;&gt;&#010;Reply-To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Date: Wednesday, June 19, 2013 12:56 PM&#010;To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Subject: Re: timeuuid and cql3 query&#010;&#010;Here's an example of that not working:&#010;&#010;cqlsh:Test&gt; desc table count4;&#010;&#010;CREATE TABLE count4 (&#010;  ts timeuuid,&#010;  counter text,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (ts, counter)&#010;) WITH&#010;  bloom_filter_fp_chance=0.010000 AND&#010;  caching='KEYS_ONLY' AND&#010;  comment='' AND&#010;  dclocal_read_repair_chance=0.000000 AND&#010;  gc_grace_seconds=864000 AND&#010;  read_repair_chance=0.100000 AND&#010;  replicate_on_write='true' AND&#010;  populate_io_cache_on_flush='false' AND&#010;  compaction={'class': 'SizeTieredCompactionStrategy'} AND&#010;  compression={'sstable_compression': 'SnappyCompressor'};&#010;&#010;cqlsh:Test&gt; select counter,dateof(ts),key1,value from count4;&#010;&#010; counter | dateof(ts)               | key1 | value&#010;---------+--------------------------+------+-------&#010;    test | 2013-06-18 22:36:16-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:19-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:19-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:20-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:20-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:21-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:21-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:22-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:22-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:23-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:23-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:25-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:27-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:28-0400 |    1 |     1&#010;&#010;cqlsh:Statistics&gt; select counter,dateof(ts),key1,value from count4 where ts &gt; minTimeuuid('2013-06-17&#010;22:36:16') and ts &lt; minTimeuuid('2013-06-19 22:36:20');&#010;Bad Request: 2 Start key must sort before (or equal to) finish key in your partitioner!&#010;&#010;&#010;&#010;Any ideas?  Seems like a bug to me, right?&#010;&#010;Brent&#010;&#010;From: &lt;Ryan&gt;, Brent Ryan &lt;bryan@cvent.com&lt;mailto:bryan@cvent.com&gt;&gt;&#010;Reply-To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Date: Wednesday, June 19, 2013 12:47 PM&#010;To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Subject: Re: timeuuid and cql3 query&#010;&#010;Tyler,&#010;&#010;You're recommending this schema instead, correct?&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (ts, counter)&#010;)&#010;&#010;I believe I tried this as well and ran into similar problems but I'll try it again.  I'm using&#010;the "ByteOrderedPartitioner" if that helps with the latest version of DSE community edition&#010;which I believe is Cassandra 1.2.3.&#010;&#010;&#010;Thanks,&#010;Brent&#010;&#010;&#010;From: Tyler Hobbs &lt;tyler@datastax.com&lt;mailto:tyler@datastax.com&gt;&gt;&#010;Reply-To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Date: Wednesday, June 19, 2013 11:00 AM&#010;To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Subject: Re: timeuuid and cql3 query&#010;&#010;&#010;On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&lt;mailto:BRyan@cvent.com&gt;&gt;&#010;wrote:&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY ((counter, ts))&#010;)&#010;&#010;Instead of doing a composite partition key, remove a set of parens and let ts be your clustering&#010;key.  That will cause cql rows to be stored in sorted order by the ts column (for a given&#010;value of "counter") and allow you to do the kind of query you're looking for.&#010;&#010;&#010;--&#010;Tyler Hobbs&#010;DataStax&lt;http://datastax.com/&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>&quot;Ryan, Brent&quot; &lt;BRyan@cvent.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCDE75ACF.2B7A%25bryan@cvent.com%3e"/>
<id>urn:uuid:%3cCDE75ACF-2B7A%25bryan@cvent-com%3e</id>
<updated>2013-06-19T16:56:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Here's an example of that not working:&#010;&#010;cqlsh:Test&gt; desc table count4;&#010;&#010;CREATE TABLE count4 (&#010;  ts timeuuid,&#010;  counter text,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (ts, counter)&#010;) WITH&#010;  bloom_filter_fp_chance=0.010000 AND&#010;  caching='KEYS_ONLY' AND&#010;  comment='' AND&#010;  dclocal_read_repair_chance=0.000000 AND&#010;  gc_grace_seconds=864000 AND&#010;  read_repair_chance=0.100000 AND&#010;  replicate_on_write='true' AND&#010;  populate_io_cache_on_flush='false' AND&#010;  compaction={'class': 'SizeTieredCompactionStrategy'} AND&#010;  compression={'sstable_compression': 'SnappyCompressor'};&#010;&#010;cqlsh:Test&gt; select counter,dateof(ts),key1,value from count4;&#010;&#010; counter | dateof(ts)               | key1 | value&#010;---------+--------------------------+------+-------&#010;    test | 2013-06-18 22:36:16-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:18-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:19-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:19-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:20-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:20-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:21-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:21-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:22-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:22-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:23-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:23-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:25-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:27-0400 |    1 |     1&#010;    test | 2013-06-18 22:36:28-0400 |    1 |     1&#010;&#010;cqlsh:Statistics&gt; select counter,dateof(ts),key1,value from count4 where ts &gt; minTimeuuid('2013-06-17&#010;22:36:16') and ts &lt; minTimeuuid('2013-06-19 22:36:20');&#010;Bad Request: 2 Start key must sort before (or equal to) finish key in your partitioner!&#010;&#010;&#010;&#010;Any ideas?  Seems like a bug to me, right?&#010;&#010;Brent&#010;&#010;From: &lt;Ryan&gt;, Brent Ryan &lt;bryan@cvent.com&lt;mailto:bryan@cvent.com&gt;&gt;&#010;Reply-To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Date: Wednesday, June 19, 2013 12:47 PM&#010;To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Subject: Re: timeuuid and cql3 query&#010;&#010;Tyler,&#010;&#010;You're recommending this schema instead, correct?&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (ts, counter)&#010;)&#010;&#010;I believe I tried this as well and ran into similar problems but I'll try it again.  I'm using&#010;the "ByteOrderedPartitioner" if that helps with the latest version of DSE community edition&#010;which I believe is Cassandra 1.2.3.&#010;&#010;&#010;Thanks,&#010;Brent&#010;&#010;&#010;From: Tyler Hobbs &lt;tyler@datastax.com&lt;mailto:tyler@datastax.com&gt;&gt;&#010;Reply-To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Date: Wednesday, June 19, 2013 11:00 AM&#010;To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Subject: Re: timeuuid and cql3 query&#010;&#010;&#010;On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&lt;mailto:BRyan@cvent.com&gt;&gt;&#010;wrote:&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY ((counter, ts))&#010;)&#010;&#010;Instead of doing a composite partition key, remove a set of parens and let ts be your clustering&#010;key.  That will cause cql rows to be stored in sorted order by the ts column (for a given&#010;value of "counter") and allow you to do the kind of query you're looking for.&#010;&#010;&#010;--&#010;Tyler Hobbs&#010;DataStax&lt;http://datastax.com/&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>&quot;Ryan, Brent&quot; &lt;BRyan@cvent.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCDE75872.2B74%25bryan@cvent.com%3e"/>
<id>urn:uuid:%3cCDE75872-2B74%25bryan@cvent-com%3e</id>
<updated>2013-06-19T16:47:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Tyler,&#010;&#010;You're recommending this schema instead, correct?&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY (ts, counter)&#010;)&#010;&#010;I believe I tried this as well and ran into similar problems but I'll try it again.  I'm using&#010;the "ByteOrderedPartitioner" if that helps with the latest version of DSE community edition&#010;which I believe is Cassandra 1.2.3.&#010;&#010;&#010;Thanks,&#010;Brent&#010;&#010;&#010;From: Tyler Hobbs &lt;tyler@datastax.com&lt;mailto:tyler@datastax.com&gt;&gt;&#010;Reply-To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Date: Wednesday, June 19, 2013 11:00 AM&#010;To: "user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;" &lt;user@cassandra.apache.org&lt;mailto:user@cassandra.apache.org&gt;&gt;&#010;Subject: Re: timeuuid and cql3 query&#010;&#010;&#010;On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&lt;mailto:BRyan@cvent.com&gt;&gt;&#010;wrote:&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY ((counter, ts))&#010;)&#010;&#010;Instead of doing a composite partition key, remove a set of parens and let ts be your clustering&#010;key.  That will cause cql rows to be stored in sorted order by the ts column (for a given&#010;value of "counter") and allow you to do the kind of query you're looking for.&#010;&#010;&#010;--&#010;Tyler Hobbs&#010;DataStax&lt;http://datastax.com/&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Reduce Cassandra GC</title>
<author><name>Mohit Anchlia &lt;mohitanchlia@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAOT3TWqypeyvtfHspG-Qgf9Awcuut71yjcUMFW766d4Jrdu+CQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAOT3TWqypeyvtfHspG-Qgf9Awcuut71yjcUMFW766d4Jrdu+CQ@mail-gmail-com%3e</id>
<updated>2013-06-19T16:34:43Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
How much data do you have per node?&#010;How much RAM per node?&#010;How much CPU per node?&#010;What is the avg CPU and memory usage?&#010;&#010;On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson &lt;samuelsson.joel@gmail.com&#010;&gt; wrote:&#010;&#010;&gt;  My Cassandra ps info:&#010;&gt;&#010;&gt; root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user&#010;&gt; cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile&#010;&gt; /var/run/cassandra.pid -errfile &amp;1 -outfile /var/log/cassandra/output.log&#010;&gt; -cp&#010;&gt; /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar&#010;&gt; -Dlog4j.configuration=log4j-server.properties&#010;&gt; -Dlog4j.defaultInitOverride=true&#010;&gt; -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof&#010;&gt; -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea&#010;&gt; -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities&#010;&gt; -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M&#010;&gt; -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC&#010;&gt; -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8&#010;&gt; -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75&#010;&gt; -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB&#010;&gt; -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199&#010;&gt; -Dcom.sun.management.jmxremote.ssl=false&#010;&gt; -Dcom.sun.management.jmxremote.authenticate=false&#010;&gt; org.apache.cassandra.service.CassandraDaemon&#010;&gt; 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc -user&#010;&gt; cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile&#010;&gt; /var/run/cassandra.pid -errfile &amp;1 -outfile /var/log/cassandra/output.log&#010;&gt; -cp&#010;&gt; /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar&#010;&gt; -Dlog4j.configuration=log4j-server.properties&#010;&gt; -Dlog4j.defaultInitOverride=true&#010;&gt; -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof&#010;&gt; -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea&#010;&gt; -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities&#010;&gt; -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M&#010;&gt; -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC&#010;&gt; -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8&#010;&gt; -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75&#010;&gt; -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB&#010;&gt; -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199&#010;&gt; -Dcom.sun.management.jmxremote.ssl=false&#010;&gt; -Dcom.sun.management.jmxremote.authenticate=false&#010;&gt; org.apache.cassandra.service.CassandraDaemon&#010;&gt;&#010;&gt; Is it normal to have two processes like this?&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>&quot;Ryan, Brent&quot; &lt;BRyan@cvent.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cAC897E04-27C2-462B-B537-8AD1EA54BEA1@cvent.com%3e"/>
<id>urn:uuid:%3cAC897E04-27C2-462B-B537-8AD1EA54BEA1@cvent-com%3e</id>
<updated>2013-06-19T16:11:53Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I'm using the byte ordered partitioner.&#010;&#010;Sent from my iPhone&#010;&#010;On Jun 19, 2013, at 11:26 AM, "Sylvain Lebresne" &lt;sylvain@datastax.com&lt;mailto:sylvain@datastax.com&gt;&gt;&#010;wrote:&#010;&#010;You're using the ordered partitioner, right?&#010;&#010;&#010;On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia &lt;davide.anastasia@gmail.com&lt;mailto:davide.anastasia@gmail.com&gt;&gt;&#010;wrote:&#010;&#010;Hi Tyler,&#010;I am interested in this scenario as well: could you please elaborate further your answer?&#010;&#010;Thanks a lot,&#010;Davide&#010;&#010;On 19 Jun 2013 16:01, "Tyler Hobbs" &lt;tyler@datastax.com&lt;mailto:tyler@datastax.com&gt;&gt;&#010;wrote:&#010;&#010;On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&lt;mailto:BRyan@cvent.com&gt;&gt;&#010;wrote:&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY ((counter, ts))&#010;)&#010;&#010;Instead of doing a composite partition key, remove a set of parens and let ts be your clustering&#010;key.  That will cause cql rows to be stored in sorted order by the ts column (for a given&#010;value of "counter") and allow you to do the kind of query you're looking for.&#010;&#010;&#010;--&#010;Tyler Hobbs&#010;DataStax&lt;http://datastax.com/&gt;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>Sylvain Lebresne &lt;sylvain@datastax.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAKkz8Q3eFPzgbNLTzRsKQzj64JWZS3k9GSg86q0CpOhJLoJMbw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAKkz8Q3eFPzgbNLTzRsKQzj64JWZS3k9GSg86q0CpOhJLoJMbw@mail-gmail-com%3e</id>
<updated>2013-06-19T15:11:28Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You're using the ordered partitioner, right?&#010;&#010;&#010;On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia &lt;&#010;davide.anastasia@gmail.com&gt; wrote:&#010;&#010;&gt; Hi Tyler,&#010;&gt; I am interested in this scenario as well: could you please elaborate&#010;&gt; further your answer?&#010;&gt;&#010;&gt; Thanks a lot,&#010;&gt; Davide&#010;&gt; On 19 Jun 2013 16:01, "Tyler Hobbs" &lt;tyler@datastax.com&gt; wrote:&#010;&gt;&#010;&gt;&gt;&#010;&gt;&gt; On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&gt; wrote:&#010;&gt;&gt;&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt;  CREATE TABLE count3 (&#010;&gt;&gt;&gt;   counter text,&#010;&gt;&gt;&gt;   ts timeuuid,&#010;&gt;&gt;&gt;   key1 text,&#010;&gt;&gt;&gt;   value int,&#010;&gt;&gt;&gt;   PRIMARY KEY ((counter, ts))&#010;&gt;&gt;&gt; )&#010;&gt;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; Instead of doing a composite partition key, remove a set of parens and&#010;&gt;&gt; let ts be your clustering key.  That will cause cql rows to be stored in&#010;&gt;&gt; sorted order by the ts column (for a given value of "counter") and allow&#010;&gt;&gt; you to do the kind of query you're looking for.&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; --&#010;&gt;&gt; Tyler Hobbs&#010;&gt;&gt; DataStax &lt;http://datastax.com/&gt;&#010;&gt;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>Davide Anastasia &lt;davide.anastasia@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCA+oKz53M3ZuQjtcHQUTrBardtcG16tmBj=oVhimuLvZSif7SJA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCA+oKz53M3ZuQjtcHQUTrBardtcG16tmBj=oVhimuLvZSif7SJA@mail-gmail-com%3e</id>
<updated>2013-06-19T15:06:59Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Tyler,&#010;I am interested in this scenario as well: could you please elaborate&#010;further your answer?&#010;&#010;Thanks a lot,&#010;Davide&#010;On 19 Jun 2013 16:01, "Tyler Hobbs" &lt;tyler@datastax.com&gt; wrote:&#010;&#010;&gt;&#010;&gt; On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&gt; wrote:&#010;&gt;&#010;&gt;&gt;&#010;&gt;&gt;  CREATE TABLE count3 (&#010;&gt;&gt;   counter text,&#010;&gt;&gt;   ts timeuuid,&#010;&gt;&gt;   key1 text,&#010;&gt;&gt;   value int,&#010;&gt;&gt;   PRIMARY KEY ((counter, ts))&#010;&gt;&gt; )&#010;&gt;&gt;&#010;&gt;&#010;&gt; Instead of doing a composite partition key, remove a set of parens and let&#010;&gt; ts be your clustering key.  That will cause cql rows to be stored in sorted&#010;&gt; order by the ts column (for a given value of "counter") and allow you to do&#010;&gt; the kind of query you're looking for.&#010;&gt;&#010;&gt;&#010;&gt; --&#010;&gt; Tyler Hobbs&#010;&gt; DataStax &lt;http://datastax.com/&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: token() function in CQL3 (1.2.5)</title>
<author><name>Tyler Hobbs &lt;tyler@datastax.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAAam9su=50U==mZJK3QLRXnLEA15Hr97z-oJYJTEqZau0TahAw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAAam9su=50U==mZJK3QLRXnLEA15Hr97z-oJYJTEqZau0TahAw@mail-gmail-com%3e</id>
<updated>2013-06-19T15:06:18Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Wed, Jun 19, 2013 at 7:47 AM, Ben Boule &lt;Ben_Boule@rapid7.com&gt; wrote:&#010;&#010;&gt;  Can anyone explain this to me?  I have been looking through the source&#010;&gt; code but can't seem to find the answer.&#010;&gt;&#010;&gt; The documentation mentions using the token() function to change a value&#010;&gt; into it's token for use in queries.   It always mentions it as taking a&#010;&gt; single parameter:&#010;&gt;&#010;&gt; SELECT * FROM posts WHERE token(userid) &gt; token('tom') AND token(userid) &lt; token('bob')&#010;&gt;&#010;&gt;&#010;&gt; However on my 1.2.5 node I am getting the following error:&#010;&gt;&#010;&gt; e.x.&#010;&gt;&#010;&gt; create table foo (&#010;&gt;     organization text,&#010;&gt;     type text,&#010;&gt;     time timestamp,&#010;&gt;     id uuid,&#010;&gt;     primary key ((organization, type, time), id))&#010;&gt;&#010;&gt; select * from foo where organization = 'companyA' and type = 'typeB' and&#010;&gt; token(time) &lt; token('somevalue') and token(time) &gt; token('othervalue')&#010;&gt;&#010;&gt; Bad Request: Invalid number of arguments in call to function token: 3&#010;&gt; required but 1 provided&#010;&gt;&#010;&gt; What are the other two parameters?  We don't currently use the token&#010;&gt; function but I was experimenting seeing if I could move the time into the&#010;&gt; partition key for a table like this to better distribute the rows.  But I&#010;&gt; can't seem to figure out how to get token() working.&#010;&gt;&#010;&#010;token() acts on the entire partition key, which for you is (organization,&#010;type, time), hence the 3 required values.&#010;&#010;In order to better distribute the rows, I suggest using a time bucket as&#010;part of the partition key.  For example, you might use only the date&#010;portion of the timestamp as the time bucket.&#010;&#010;These posts talk about doing something similar with the Thrift API, but&#010;they will probably still be helpful:&#010;- http://rubyscale.com/2011/basic-time-series-with-cassandra/&#010;- http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#010;&#010;-- &#010;Tyler Hobbs&#010;DataStax &lt;http://datastax.com/&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: timeuuid and cql3 query</title>
<author><name>Tyler Hobbs &lt;tyler@datastax.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAAam9ssMHw+oCX3=zmRam541MZ28ud7+vcd0aco38Yp_HOtYXQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAAam9ssMHw+oCX3=zmRam541MZ28ud7+vcd0aco38Yp_HOtYXQ@mail-gmail-com%3e</id>
<updated>2013-06-19T15:00:47Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent &lt;BRyan@cvent.com&gt; wrote:&#010;&#010;&gt;&#010;&gt;  CREATE TABLE count3 (&#010;&gt;   counter text,&#010;&gt;   ts timeuuid,&#010;&gt;   key1 text,&#010;&gt;   value int,&#010;&gt;   PRIMARY KEY ((counter, ts))&#010;&gt; )&#010;&gt;&#010;&#010;Instead of doing a composite partition key, remove a set of parens and let&#010;ts be your clustering key.  That will cause cql rows to be stored in sorted&#010;order by the ts column (for a given value of "counter") and allow you to do&#010;the kind of query you're looking for.&#010;&#010;&#010;-- &#010;Tyler Hobbs&#010;DataStax &lt;http://datastax.com/&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>DC dedicated to Hadoop jobs</title>
<author><name>&lt;cscetbon.ext@orange.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3c21085_1371652925_51C1C33D_21085_3618_1_6CDB0EBC-DF96-4A46-84AB-E501340611E0@orange.com%3e"/>
<id>urn:uuid:%3c21085_1371652925_51C1C33D_21085_3618_1_6CDB0EBC-DF96-4A46-84AB-E501340611E0@orange-com%3e</id>
<updated>2013-06-19T14:42:04Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;&#010;Our Hadoop jobs will only do READs and we want to restrict reads in this dedicated DC  even&#010;if performances are bad. &#010;&#010;What can we do to achieve this goal ? &#010;- set dynamic_snitch_badness_threshold to 0.98 on these DC's nodes ? can we have different&#010;dynamic_snitch_badness_threshold values on nodes from different DC ?&#010;- for consistency to LOCAL_QUORUM ? However if we have more than one replica we'll do more&#010;reads&#010;&#010;thanks&#010;-- &#010;Cyril SCETBON&#010;&#010;&#010;_________________________________________________________________________________________________________________________&#010;&#010;Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees&#010;et ne doivent donc&#010;pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par&#010;erreur, veuillez le signaler&#010;a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant&#010;susceptibles d'alteration,&#010;France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou&#010;falsifie. Merci.&#010;&#010;This message and its attachments may contain confidential or privileged information that may&#010;be protected by law;&#010;they should not be distributed, used or copied without authorisation.&#010;If you have received this email in error, please notify the sender and delete this message&#010;and its attachments.&#010;As emails may be altered, France Telecom - Orange is not liable for messages that have been&#010;modified, changed or falsified.&#010;Thank you.&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: vnodes ready for production ?</title>
<author><name>Jim Ancona &lt;jim@anconafamily.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAKYY9AJ5OXjQa6WDKYXSEJC6JBN37KYA_4HQXkkMNo88HixZ2Q@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAKYY9AJ5OXjQa6WDKYXSEJC6JBN37KYA_4HQXkkMNo88HixZ2Q@mail-gmail-com%3e</id>
<updated>2013-06-19T14:04:09Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Tue, Jun 18, 2013 at 4:04 AM, aaron morton &lt;aaron@thelastpickle.com&gt; wrote:&#010;&gt;&gt; Even more if we could automate some up-scale thanks to AWS alarms, It&#010;&gt;&gt; would be awesome.&#010;&gt;&#010;&gt; I saw a demo for Priam (https://github.com/Netflix/Priam) doing that at&#010;&gt; netflix in March, not sure if it's public yet.&#010;&gt;&#010;&gt;&gt; Are the vnodes feature and the tokens =&gt;vnodes transition safe enough to&#010;&gt;&gt; go live with vnodes ?&#010;&gt;&#010;&gt; There have been some issues, search the user list for shuffle and as always&#010;&gt; test.&#010;&gt;&#010;&gt;&gt; Any advice about vnodes ?&#010;&gt;&#010;&gt; They are in use out there. It's a sizable change so it would be good idea to&#010;&gt; build a test system for running shuffle and testing your application. There&#010;&gt; have been some issues with repair and range scans (including hadoop&#010;&gt; integration.)&#010;&#010;Also, in his presentation at last week's Summit, Eric Evans suggested&#010;not using shuffle. As an alternative he suggested removing and&#010;replacing nodes one-by-one.&#010;&#010;Jim&#010;&#010;&gt;&#010;&gt; Cheers&#010;&gt;&#010;&gt; -----------------&#010;&gt; Aaron Morton&#010;&gt; Freelance Cassandra Consultant&#010;&gt; New Zealand&#010;&gt;&#010;&gt; @aaronmorton&#010;&gt; http://www.thelastpickle.com&#010;&gt;&#010;&gt; On 18/06/2013, at 7:04 PM, Alain RODRIGUEZ &lt;arodrime@gmail.com&gt; wrote:&#010;&gt;&#010;&gt; Any insights on vnodes, one month after my original post ?&#010;&gt;&#010;&gt;&#010;&gt; 2013/5/16 Alain RODRIGUEZ &lt;arodrime@gmail.com&gt;&#010;&gt;&gt;&#010;&gt;&gt; Hi,&#010;&gt;&gt;&#010;&gt;&gt; Adding vnodes is a big improvement to Cassandra, specifically because we&#010;&gt;&gt; have a fluctuating load on our Cassandra depending on the week, and it is&#010;&gt;&gt; quite annoying to add some nodes for one week or two, move tokens and then&#010;&gt;&gt; having to remove them and then move tokens again. Even more if we could&#010;&gt;&gt; automate some up-scale thanks to AWS alarms, It would be awesome.&#010;&gt;&gt;&#010;&gt;&gt; We don't use vnodes yet because Opscenter did not support this feature and&#010;&gt;&gt; because we need to have a reliable production. Now Opscenter handles vnodes.&#010;&gt;&gt;&#010;&gt;&gt; Are the vnodes feature and the tokens =&gt;vnodes transition safe enough to&#010;&gt;&gt; go live with vnodes ?&#010;&gt;&gt;&#010;&gt;&gt; What would be the transition process ?&#010;&gt;&gt;&#010;&gt;&gt; Does someone auto-scale his Cassandra cluster ?&#010;&gt;&gt;&#010;&gt;&gt; Any advice about vnodes ?&#010;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Unit Testing Cassandra</title>
<author><name>Edward Capriolo &lt;edlinuxguru@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCAENxBwwFzhMaM-GLuTAzDz1d1uWNGqMCD_R1abdYBhtaYqv1Og@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAENxBwwFzhMaM-GLuTAzDz1d1uWNGqMCD_R1abdYBhtaYqv1Og@mail-gmail-com%3e</id>
<updated>2013-06-19T13:58:37Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You really do not need much in java you can use the embedded server. Hector&#010;wrap a simple class around thiscalled  EmbeddedServerHelper&#010;&#010;On Wednesday, June 19, 2013, Ben Boule &lt;Ben_Boule@rapid7.com&gt; wrote:&#010;&gt; Hi Shabab,&#010;&gt;&#010;&gt; Cassandra-Unit has been helpful for us for running unit tests without&#010;requiring a real cassandra instance to be running.   We only use this to&#010;test our "DAO" code which interacts with the Cassandra client.  It&#010;basically starts up an embedded instance of cassandra and fools your&#010;client/driver into using it.  It uses a non-standard port and you just need&#010;to make sure you can set the port as a parameter into your client code.&#010;&gt;&#010;&gt; https://github.com/jsevellec/cassandra-unit&#010;&gt;&#010;&gt; One important thing is to either clear out the keyspace in between tests&#010;or carefully separate your data so different tests don't collide with each&#010;other in the embedded database.&#010;&gt;&#010;&gt; Setup/tear down time is pretty reasonable.&#010;&gt;&#010;&gt; Ben&#010;&gt; ________________________________&#010;&gt; From: Shahab Yunus [shahab.yunus@gmail.com]&#010;&gt; Sent: Wednesday, June 19, 2013 8:46 AM&#010;&gt; To: user@cassandra.apache.org&#010;&gt; Subject: Re: Unit Testing Cassandra&#010;&gt;&#010;&gt; Thanks Stephen for you reply and explanation. My bad that I mixed those&#010;up and wasn't clear enough. Yes, I have different 2 requests/questions.&#010;&gt; 1) One is for the unit testing.&#010;&gt; 2) Second (in which I am more interested in) is for performance&#010;(stress/load) testing. Let us keep integration aside for now.&#010;&gt; I do see some stuff out there but wanted to know recommendations from the&#010;community given their experience.&#010;&gt; Regards,&#010;&gt; Shahab&#010;&gt;&#010;&gt; On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly &lt;&#010;stephen.alan.connolly@gmail.com&gt; wrote:&#010;&gt;&gt;&#010;&gt;&gt; Unit testing means testing in isolation the smallest part.&#010;&gt;&gt; Unit tests should not take more than a few milliseconds to set up and&#010;verify their assertions.&#010;&gt;&gt; As such, if your code is not factored well for testing, you would&#010;typically use mocking (either by hand, or with mocking libraries) to mock&#010;out the bits not under test.&#010;&gt;&gt; Extensive use of mocks is usually a smell of code that is not well&#010;designed *for testing*&#010;&gt;&gt; If you intend to test components integrated together... That is&#010;integration testing.&#010;&gt;&gt; If you intend to test performance of the whole or significant parts of&#010;the whole... That is performance testing.&#010;&gt;&gt; When searching for the above, you will not get much luck if you are&#010;looking for them in the context of "unit testing" as those things are&#010;*outside the scope of unit testing"&#010;&gt;&gt;&#010;&gt;&gt; On Wednesday, 19 June 2013, Shahab Yunus wrote:&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Hello,&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Can anyone suggest a good/popular Unit Test tools/frameworks/utilities&#010;out&#010;&gt;&gt;&gt; there for unit testing Cassandra stores? I am looking for testing from&#010;performance/load and monitoring perspective. I am using 1.2.&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Thanks a lot.&#010;&gt;&gt;&gt;&#010;&gt;&gt;&gt; Regards,&#010;&gt;&gt;&gt; Shahab&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; --&#010;&gt;&gt; Sent from my phone&#010;&gt;&#010;&gt; This electronic message contains information which may be confidential or&#010;privileged. The information is intended for the use of the individual or&#010;entity named above. If you are not the intended recipient, be aware that&#010;any disclosure, copying, distribution or use of the contents of this&#010;information is prohibited. If you have received this electronic&#010;transmission in error, please notify us by e-mail at (postmaster@rapid7.com)&#010;immediately.&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>timeuuid and cql3 query</title>
<author><name>&quot;Ryan, Brent&quot; &lt;BRyan@cvent.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201306.mbox/%3cCDE7259C.2B1E%25bryan@cvent.com%3e"/>
<id>urn:uuid:%3cCDE7259C-2B1E%25bryan@cvent-com%3e</id>
<updated>2013-06-19T13:08:45Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I'm experimenting with a data model that will need to ingest a lot of data that will need to&#010;be query able by time.  In the example below, I want to be able to run a query like "select&#010;* from count3 where counter = 'test' and ts &gt; minTimeuuid('2013-06-18 16:23:00') and ts&#010;&lt; minTimeuuid('2013-06-18 16:24:00');".  However, in certain cases this query fails with&#010;the error "Bad Request: Start key must sort before (or equal to) finish key in your partitioner!".&#010; It's not clear to be why this happens or what the issue is as it seems like a bug.&#010;&#010;Here's the table:&#010;&#010;CREATE TABLE count3 (&#010;  counter text,&#010;  ts timeuuid,&#010;  key1 text,&#010;  value int,&#010;  PRIMARY KEY ((counter, ts))&#010;)&#010;&#010;It has data like so:&#010;&#010;cqlsh:Statistics&gt; select counter,dateof(ts),key1,value from count3;&#010;&#010; counter | dateof(ts)               | key1 | value&#010;---------+--------------------------+------+-------&#010;    test | 2013-06-18 16:23:25-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:28-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:28-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:28-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:29-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:29-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:29-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:30-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:30-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:31-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:31-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:31-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:32-0400 |    1 |     1&#010;    test | 2013-06-18 16:23:32-0400 |    1 |     1&#010;&#010;&#010;DOESN'T WORK:&#010;    cqlsh:Statistics&gt; select * from count3 where counter = 'test' and ts &gt; minTimeuuid('2013-06-18&#010;16:23:00') and ts &lt; minTimeuuid('2013-06-18 16:24:00');&#010;Bad Request: Start key must sort before (or equal to) finish key in your partitioner!&#010;&#010;WORKS FINE:&#010;cqlsh:Statistics&gt; select * from count3 where counter = 'test' and ts &gt; minTimeuuid('2013-06-18&#010;16:23:25') and ts &lt; minTimeuuid('2013-06-18 16:23:31');&#010;&#010; counter | ts                                   | key1 | value&#010;---------+--------------------------------------+------+-------&#010;    test | edee0df0-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | ef9a5e60-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | efccb900-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | effb1c00-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | f0284680-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | f05b8b80-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | f08c5f80-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | f0c6f780-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;    test | f1018f80-d854-11e2-ac46-cba9e55f995d |    1 |     1&#010;&#010;&#010;Thanks,&#010;Brent&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
</feed>
