I have upgraded 2 nodes out of a 12 mode test cluster from 1.1.10 to 1.2.3. During startup while tailing C*'s system.log, I observed a series of SSTable batch load messages and skipping sstable due to bloom filter debug messages which is normal for startup, but when it reached loading saved key caches, it gets stuck forever. The I/O wait stays high in the CPU graph and I/O ops are sent to disk, but C* never passes that step of loading the key cache file successfully. The saved key cache file was about 75MB on one node and 125MB on the other node and they were for different CFs.
The CPU I/O wait constantly stayed at 40%~ while system.log was stuck at loading one saved key cache file. I have marked that on the graph above. The workaround was to delete the saved cache files and things loaded fine (See marked Normal Startup).
These machines are m1.xlarge EC2 instances. And this issue happened on both nodes upgraded. This did not happen during exercise of upgrade from 1.1.6 to 1.2.2 using the same snapshot.
Should I raise a JIRA?