Thanks a lot Ben, actually I managed to make it work erasing the SimpleDB Priam uses to keeps instances... I would pulled the last commit from the repo, not sure if it helped or not.

But you message made me curious about something...  How do you do to add more Cassandra nodes on the fly? Just update the autoscale properties? I saw instaclustr.com changes the instance type as the number of nodes increase (not sure why the price also becomes higher per instance in this case), I am guessing priam use the data backed up to S3 to restore a node data in another instance, right?

[]s



2013/2/28 Ben Bromhead <ben@relational.io>
Off the top of my head I would check to make sure the Autoscaling Group you created is restricted to a single Availability Zone, also Priam sets the number of EC2 instances it expects based on the maximum instance count you set on your scaling group (it did this last time i checked a few months ago, it's behaviour may have changed). 

So I would make your desired, min and max instances for your scaling group are all the same, make sure your ASG is restricted to a single availability zone (e.g. us-east-1b) and then (if you are able to and there is no data in your cluster) delete all the SimpleDB entries Priam has created and then also possibly clear out the cassandra data directory. 

Other than that I see you've raised it as an issue on the Priam project page , so see what they say ;)

Cheers

Ben

On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle <mvallebr@gmail.com> wrote:
One additional important info, I checked here and the seeds seems really different on each node. The command
returns ip2 on first node and ip1,ip1 on second node.
Any idea why? It's probably what is causing cassandra to die, right?


2013/2/27 Marcelo Elias Del Valle <mvallebr@gmail.com>
Hello Ben, Thanks for the willingness to help, 

2013/2/27 Ben Bromhead <ben@instaclustr.com>
Have your added the priam java agent to cassandras JVM argurments (e.g. -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar)  and does the web container running priam have permissions to write to the cassandra config directory? Also what do the priam logs say?

I put the priam log of the first node bellow. Yes, I have added priam-cass-extensions to java args and Priam IS actually writting to cassandra dir.
 
If you want to get up and running quickly with cassandra, AWS and priam quickly check out www.instaclustr.com you. 
We deploy Cassandra under your AWS account and you have full root access to the nodes if you want to explore and play around + there is a free tier which is great for experimenting and trying Cassandra out.

That sounded really great. I am not sure if it would apply to our case (will consider it though), but some partners would have a great benefit from it, for sure! I will send your link to them.

What priam says:

2013-02-27 14:14:58.0614 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/public-hostname returns: ec2-174-129-59-107.compute-1.amazon
2013-02-27 14:14:58.0615 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/public-ipv4 returns: 174.129.59.107
2013-02-27 14:14:58.0618 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb
2013-02-27 14:14:58.0618 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium
2013-02-27 14:14:59.0614 INFO pool-2-thread-1 com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1, ASG Name set to dmp_cluster-useast1b
2013-02-27 14:14:59.0746 INFO pool-2-thread-1 com.netflix.priam.defaultimpl.PriamConfiguration appid used to fetch properties is: dmp_cluster
2013-02-27 14:14:59.0843 INFO pool-2-thread-1 org.quartz.simpl.SimpleThreadPool Job execution threads will use class loader of thread: pool-2-thread-1
2013-02-27 14:14:59.0861 INFO pool-2-thread-1 org.quartz.core.SchedulerSignalerImpl Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2013-02-27 14:14:59.0862 INFO pool-2-thread-1 org.quartz.core.QuartzScheduler Quartz Scheduler v.1.7.3 created.
2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.simpl.RAMJobStore RAMJobStore initialized.
2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.impl.StdSchedulerFactory Quartz scheduler 'DefaultQuartzScheduler' initialized from default resource file in Quartz package: 'quartz.propertie
s'
2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.impl.StdSchedulerFactory Quartz scheduler version: 1.7.3
2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.core.QuartzScheduler JobFactory set to: com.netflix.priam.scheduler.GuiceJobFactory@1b6a1c4
2013-02-27 14:15:00.0239 INFO pool-2-thread-1 com.netflix.priam.aws.AWSMembership Querying Amazon returned following instance in the ASG: us-east-1b --> i-8eb32bfd,i-88b32bfb
2013-02-27 14:15:01.0470 INFO Timer-0 org.quartz.utils.UpdateChecker New update(s) found: 1.8.5 [http://www.terracotta.org/kit/reflector?kitID=default&pageID=QuartzChangeLog]
2013-02-27 14:15:10.0925 INFO pool-2-thread-1 com.netflix.priam.identity.InstanceIdentity Found dead instances: i-d49a0da7
2013-02-27 14:15:11.0397 ERROR pool-2-thread-1 com.netflix.priam.aws.SDBInstanceFactory Conditional check failed. Attribute (instanceId) value exists
2013-02-27 14:15:11.0398 ERROR pool-2-thread-1 com.netflix.priam.utils.RetryableCallable Retry #1 for: Status Code: 409, AWS Service: AmazonSimpleDB, AWS Request ID: 96ca7ae5-f352-b13a-febd-8801d46fe
e83, AWS Error Code: ConditionalCheckFailed, AWS Error Message: Conditional check failed. Attribute (instanceId) value exists
2013-02-27 14:15:11.0686 INFO pool-2-thread-1 com.netflix.priam.aws.AWSMembership Querying Amazon returned following instance in the ASG: us-east-1b --> i-8eb32bfd,i-88b32bfb
2013-02-27 14:15:25.0258 INFO pool-2-thread-1 com.netflix.priam.identity.InstanceIdentity Found dead instances: i-d89a0dab
2013-02-27 14:15:25.0588 INFO pool-2-thread-1 com.netflix.priam.identity.InstanceIdentity Trying to grab slot 1808575601 with availability zone us-east-1b
2013-02-27 14:15:25.0732 INFO pool-2-thread-1 com.netflix.priam.identity.InstanceIdentity My token: 56713727820156410577229101240436610842
2013-02-27 14:15:25.0732 INFO pool-2-thread-1 org.quartz.core.QuartzScheduler Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2013-02-27 14:15:25.0878 INFO pool-2-thread-1 org.apache.cassandra.db.HintedHandOffManager cluster_name: dmp_cluster
initial_token: null
hinted_handoff_enabled: true
max_hint_window_in_ms: 8
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authorizer: org.apache.cassandra.auth.AllowAllAuthorizer
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
key_cache_size_in_mb: null
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
row_cache_provider: SerializingCacheProvider
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: com.netflix.priam.cassandra.extensions.NFSeedProvider
  parameters:
  - seeds: 127.0.0.1
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: null
start_native_transport: false
native_transport_port: 9042
start_rpc: true
rpc_address: null
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: true
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 128
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 8
compaction_preheat_key_cache: true
read_request_timeout_in_ms: 10000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 10000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
server_encryption_options:
  internode_encryption: none
  keystore: conf/.keystore
  keystore_password: cassandra
  truststore: conf/.truststore
  truststore_password: cassandra
client_encryption_options:
  enabled: false
  keystore: conf/.keystore
  keystore_password: cassandra
internode_compression: all
inter_dc_tcp_nodelay: true
auto_bootstrap: true
memtable_total_space_in_mb: 1024
stream_throughput_outbound_megabits_per_sec: 400
num_tokens: 1

2013-02-27 14:15:25.0884 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Starting cassandra server ....Join ring=true
2013-02-27 14:15:25.0915 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Starting cassandra server ....
2013-02-27 14:15:30.0013 INFO http-bio-8080-exec-1 com.netflix.priam.aws.AWSMembership Query on ASG returning 3 instances
2013-02-27 14:15:31.0726 INFO http-bio-8080-exec-2 com.netflix.priam.aws.AWSMembership Query on ASG returning 3 instances
2013-02-27 14:15:37.0360 INFO DefaultQuartzScheduler_Worker-5 com.netflix.priam.aws.S3FileSystem Uploading to backup/us-east-1/dmp_cluster/56713727820156410577229101240436610842/201302271415/SST/system/local/system-local-ib-1-CompressionInfo.db with chunk size 10485760



Best regards, 
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr



--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr



--
Ben Bromhead

Co-founder



--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr