>>>>> "Eric" == Eric Czech <firstname.lastname@example.org> writes:
Eric> We're exploring a data processing procedure where we snapshot
Eric> our production cluster data and move that data to a new
Eric> cluster for analysis but I'm having some strange issues where
Eric> the analysis cluster is still somehow aware of the production
Eric> cluster (i.e. the production cluster ring is trying to include
Eric> nodes from the other cluster with the same token).
Are you using the same cluster name in for both clusters? If so, I would
suggest you don't.
Eric> The seed addresses in cassandra.yaml definitely prohibit this
Eric> type of intersection between the two clusters so I'm guessing
Eric> that it has something to do with the information in the system
I'm sure you will get a more knowledgeable answer from people who have
been doing this for a while: but I have to ask are copying over the
LocationInfo* SSTables from the snapshot to the analysis cluster?
The LocationInfo CF can record the endpoints in your production cluster.
>From the little I've read of the code (StorageService.java and
SystemTable.java) it is possible (likely?) that endpoints from your
production cluster will get added to your analysis cluster's Gossiper on
startup. If you are using the same cluster name, well, there you have
Eric> Is there anyway to duplicate raw sstables in an effort to
Eric> "copy" a cluster such that the copied cluster has a different
Eric> name? I know this usually results in a "saved cluster name X
Eric> != Y" sort of error but it looks like we need to find some
Eric> sort of way to do this logical separation.
Copying the raw tables and ignoring/deleting the
data/system/LocationInfo* files has worked for me. But I have to add the
disclaimer that I'm definitely a Cassandra newbie!