cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Hänel <rol...@haenel.me>
Subject Re: Can Cassandra make real use of several DataFileDirectories?
Date Mon, 26 Apr 2010 11:39:48 GMT
Thanks very much. Precisely answers my questions. :-)

2010/4/26 Schubert Zhang <zsongbo@gmail.com>

> Please refer the code:
>
> org.apache.cassandra.db.ColumnFamilyStore
>
>     public String getFlushPath()
>     {
>         long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() *
> 1024*1024; // 2* adds room for keys, column indexes
>         String location =
> DatabaseDescriptor.getDataFileLocationForTable(table_, guessedSize);
>         if (location == null)
>             throw new RuntimeException("Insufficient disk space to flush");
>         return new File(location,
> getTempSSTableFileName()).getAbsolutePath();
>     }
>
> and we can go through org.apache.cassandra.config.DatabaseDescriptor:
>
>     public static String getDataFileLocationForTable(String table, long
> expectedCompactedFileSize)
>     {
>       long maxFreeDisk = 0;
>       int maxDiskIndex = 0;
>       String dataFileDirectory = null;
>       String[] dataDirectoryForTable =
> getAllDataFileLocationsForTable(table);
>
>       for ( int i = 0 ; i < dataDirectoryForTable.length ; i++ )
>       {
>         File f = new File(dataDirectoryForTable[i]);
>         if( maxFreeDisk < f.getUsableSpace())
>         {
>           maxFreeDisk = f.getUsableSpace();
>           maxDiskIndex = i;
>         }
>       }
>       // Load factor of 0.9 we do not want to use the entire disk that is
> too risky.
>       maxFreeDisk = (long)(0.9 * maxFreeDisk);
>       if( expectedCompactedFileSize < maxFreeDisk )
>       {
>         dataFileDirectory = dataDirectoryForTable[maxDiskIndex];
>         currentIndex = (maxDiskIndex + 1 )%dataDirectoryForTable.length ;
>       }
>       else
>       {
>         currentIndex = maxDiskIndex;
>       }
>         return dataFileDirectory;
>     }
>
> So, DataFileDirectories means multiple disks or disk-partitions.
> I think your storage01, storage02 and storage03 are in same disk or disk
> partition.
>
>
> 2010/4/26 Roland Hänel <roland@haenel.me>
>
> I have a configuration like this:
>>
>>   <DataFileDirectories>
>>       <DataFileDirectory>/storage01/cassandra/data</DataFileDirectory>
>>       <DataFileDirectory>/storage02/cassandra/data</DataFileDirectory>
>>       <DataFileDirectory>/storage03/cassandra/data</DataFileDirectory>
>>   </DataFileDirectories>
>>
>> After loading a big chunk of data into cassandra, I end up wich some 70GB
>> in the first directory, and only about 10GB in the second and third one. All
>> rows are quite small, so it's not just some big rows that contain the
>> majority of data.
>>
>> Does Cassandra have the ability to 'see' the maximum available space in
>> these directory? I'm asking myself this question since my limit is 100GB,
>> and the first directory is approaching this limit...
>>
>> And, wouldn't it be better if Cassandra tried to 'load-balance' the files
>> inside the directories because this will result in better (read) performance
>> if the directories are on different disks (which is the case for me)?
>>
>> Any help is appreciated.
>>
>> Roland
>>
>>
>

Mime
View raw message