I am researching various hash-tables and b-trees on disk.
while I researched, I has a thoughts about cassandra sstables that I want to verify it here.
1. cassandra sstable uses sequential disk I/O when created. e.g. disk head write it from the beginning to the end. Assuming the disk is not fragmented, the sstable is placed on disk sectors one after the other.
2. when cassandra lookups a key in sstable (assuming bloom-filter and other "stuff" failed, also assuming the key is located in this single sstable), cassandra DO NOT USE sequential I/O. "She" probably will read the hash-table slot or similar structure, then cassandra will do another disk seek in order to get the value (and probably the key). Also probably there will need another seek, if there is key collision there will need additional seeks.
3. once the data (e.g. the row) is located, a sequential read for entire row will occur. (Once again I assume there is single well compacted sstable). Also if disk is not fragmented, the data will be placed on disk sectors one after the other.
Am I wrong?