Thanks very much. Precisely answers my questions. :-)

2010/4/26 Schubert Zhang <zsongbo@gmail.com>
Please refer the code:

org.apache.cassandra.db.ColumnFamilyStore

    public String getFlushPath()
    {
        long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() * 1024*1024; // 2* adds room for keys, column indexes
        String location = DatabaseDescriptor.getDataFileLocationForTable(table_, guessedSize);
        if (location == null)
            throw new RuntimeException("Insufficient disk space to flush");
        return new File(location, getTempSSTableFileName()).getAbsolutePath();
    }

and we can go through org.apache.cassandra.config.DatabaseDescriptor:

    public static String getDataFileLocationForTable(String table, long expectedCompactedFileSize)
    {
      long maxFreeDisk = 0;
      int maxDiskIndex = 0;
      String dataFileDirectory = null;
      String[] dataDirectoryForTable = getAllDataFileLocationsForTable(table);

      for ( int i = 0 ; i < dataDirectoryForTable.length ; i++ )
      {
        File f = new File(dataDirectoryForTable[i]);
        if( maxFreeDisk < f.getUsableSpace())
        {
          maxFreeDisk = f.getUsableSpace();
          maxDiskIndex = i;
        }
      }
      // Load factor of 0.9 we do not want to use the entire disk that is too risky.
      maxFreeDisk = (long)(0.9 * maxFreeDisk);
      if( expectedCompactedFileSize < maxFreeDisk )
      {
        dataFileDirectory = dataDirectoryForTable[maxDiskIndex];
        currentIndex = (maxDiskIndex + 1 )%dataDirectoryForTable.length ;
      }
      else
      {
        currentIndex = maxDiskIndex;
      }
        return dataFileDirectory;
    }

So, DataFileDirectories means multiple disks or disk-partitions.
I think your storage01, storage02 and storage03 are in same disk or disk partition.


2010/4/26 Roland Hänel <roland@haenel.me>

I have a configuration like this:

  <DataFileDirectories>
      <DataFileDirectory>/storage01/cassandra/data</DataFileDirectory>
      <DataFileDirectory>/storage02/cassandra/data</DataFileDirectory>
      <DataFileDirectory>/storage03/cassandra/data</DataFileDirectory>
  </DataFileDirectories>

After loading a big chunk of data into cassandra, I end up wich some 70GB in the first directory, and only about 10GB in the second and third one. All rows are quite small, so it's not just some big rows that contain the majority of data.

Does Cassandra have the ability to 'see' the maximum available space in these directory? I'm asking myself this question since my limit is 100GB, and the first directory is approaching this limit...

And, wouldn't it be better if Cassandra tried to 'load-balance' the files inside the directories because this will result in better (read) performance if the directories are on different disks (which is the case for me)?

Any help is appreciated.

Roland