The tokens were different than the production cluster and after closer inspection a lot of data wasn't queryable (as excpected I suppose).  I set the tokens and everything seems ok now.

Auto bootstrap was false so no issues there. 

Thanks for the insight Shyamal!  It's good to finally have this up and running.


On Sun, Oct 2, 2011 at 8:29 PM, Shyamal Prasad <shyamal@member.fsf.org> wrote:
>>>>> "Eric" == Eric Czech <eric@nextbigsound.com> writes:

   Eric> Yea that's not a mapping I'd like to maintain either -- as an
   Eric> experiment, I copied production sstables to the analysis
   Eric> cluster and ran brisk/cassandra without specifying an initial
   Eric> token (after deleting the LocationInfo* files and renaming the
   Eric> cluster).

Based on my understanding this will allow everything to start up, yes.

   Eric>  As far as I can tell, everything is running normally but I'm
   Eric> not sure how the cluster chose tokens for the nodes given that
   Eric> I didn't specify them after just dropping the raw sstables
   Eric> in.  I can still read data as usual from the column families
   Eric> that were copied but I'm not sure how not specifying the
   Eric> tokens affects everything.

Did you check the ring to see what tokens you got for the analysis
cluster? I would be surprised if you got the same ring configuration as
production.

   Eric>  Is some of my data just unreachable now because the tokens
   Eric> weren't manually defined?

I suspect your data is messed up. But the best way to determine it would
be to examine the ring (use nodetool) - if it is the same as your
production cluster you are good to go.

Also, did you set your (non seed) nodes in the analysis cluster to auto
bootstrap or not? That impacts what happens.


   Eric>  This doesn't appear to be the case but is this something you
   Eric> have tried too or do you understand the storage / topology
   Eric> logic well enough to know that this isn't a viable strategy?

No and No. I have been reading the code. Line 497 of
org.apache.cassandra.service.StorageService.java on trunk is a good
place to start since what happens depends somewhat on your specific
cassandra.yaml settings (specifically auto bootstrap).

I would be betting you are getting random tokens (look for "Generated
random token..." in your log). Don't trust me, read the code. I have all
of two weeks of experience with this stuff (and it's not quite my day
job to be doing it either :-)

Bottom line: I think you need to fix the seeds for your use case.

Cheers!
Shyamal