cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Schubert Zhang <zson...@gmail.com>
Subject Re: Can Cassandra make real use of several DataFileDirectories?
Date Mon, 26 Apr 2010 10:38:42 GMT
Please refer the code:

org.apache.cassandra.db.ColumnFamilyStore

    public String getFlushPath()
    {
        long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() *
1024*1024; // 2* adds room for keys, column indexes
        String location =
DatabaseDescriptor.getDataFileLocationForTable(table_, guessedSize);
        if (location == null)
            throw new RuntimeException("Insufficient disk space to flush");
        return new File(location,
getTempSSTableFileName()).getAbsolutePath();
    }

and we can go through org.apache.cassandra.config.DatabaseDescriptor:

    public static String getDataFileLocationForTable(String table, long
expectedCompactedFileSize)
    {
      long maxFreeDisk = 0;
      int maxDiskIndex = 0;
      String dataFileDirectory = null;
      String[] dataDirectoryForTable =
getAllDataFileLocationsForTable(table);

      for ( int i = 0 ; i < dataDirectoryForTable.length ; i++ )
      {
        File f = new File(dataDirectoryForTable[i]);
        if( maxFreeDisk < f.getUsableSpace())
        {
          maxFreeDisk = f.getUsableSpace();
          maxDiskIndex = i;
        }
      }
      // Load factor of 0.9 we do not want to use the entire disk that is
too risky.
      maxFreeDisk = (long)(0.9 * maxFreeDisk);
      if( expectedCompactedFileSize < maxFreeDisk )
      {
        dataFileDirectory = dataDirectoryForTable[maxDiskIndex];
        currentIndex = (maxDiskIndex + 1 )%dataDirectoryForTable.length ;
      }
      else
      {
        currentIndex = maxDiskIndex;
      }
        return dataFileDirectory;
    }

So, DataFileDirectories means multiple disks or disk-partitions.
I think your storage01, storage02 and storage03 are in same disk or disk
partition.


2010/4/26 Roland Hänel <roland@haenel.me>

> I have a configuration like this:
>
>   <DataFileDirectories>
>       <DataFileDirectory>/storage01/cassandra/data</DataFileDirectory>
>       <DataFileDirectory>/storage02/cassandra/data</DataFileDirectory>
>       <DataFileDirectory>/storage03/cassandra/data</DataFileDirectory>
>   </DataFileDirectories>
>
> After loading a big chunk of data into cassandra, I end up wich some 70GB
> in the first directory, and only about 10GB in the second and third one. All
> rows are quite small, so it's not just some big rows that contain the
> majority of data.
>
> Does Cassandra have the ability to 'see' the maximum available space in
> these directory? I'm asking myself this question since my limit is 100GB,
> and the first directory is approaching this limit...
>
> And, wouldn't it be better if Cassandra tried to 'load-balance' the files
> inside the directories because this will result in better (read) performance
> if the directories are on different disks (which is the case for me)?
>
> Any help is appreciated.
>
> Roland
>
>

Mime
View raw message