cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Kinder (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-14229) Separate data drive for Index.db files
Date Tue, 13 Feb 2018 00:31:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dan Kinder updated CASSANDRA-14229:
-----------------------------------
    Summary: Separate data drive for Index.db files  (was: Separate data drive for smaller
SSTable files)

> Separate data drive for Index.db files
> --------------------------------------
>
>                 Key: CASSANDRA-14229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14229
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Local Write-Read Paths
>            Reporter: Dan Kinder
>            Priority: Minor
>
> For datasets with an active set of keys that well exceeds ram, it would be quite useful
to be able to put certain sstable files (e.g. *-Index.db) on a separate, faster drive(s)
than the data. E.g. put the indexes on SSD and data on HDD. Particularly valuable when keys
are much smaller than values. Also as ram continues to get more expensive, users that currently
optimize by having large key caches may not need to buy as much of it.
> Our use case is a large dataset like this one. Storing all the data on SSD is cost-prohibitive,
and the reads are extremely random (effectively every key is in the active set), so we don't
have enough ram to cache it. (I did try using a massive key cache, 64GB, and was seeing strange
behavior anyway... irqbalancer process pegged the cpu and the whole thing way underperformed.
An investigation for another day.)
> At the moment our only resolution is to buy enough HDD to handle 2 seeks per read, 1
for the index and 1 for the data. But having indexes on SSD would speed this way up, and practically
require us to purchase a small number of SSDs and about 1/2 the number of HDD.
> One user suggested lvmcache, which could work. I'd like to hear if this will really
work optimally and if lvmcache will really keep the right blocks on the faster volume, and
how reliable it is at the task.
> Note: asked about this on the mailing list and it was suggested I create a JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message