cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From preetika tyagi <preetikaty...@gmail.com>
Subject Re: question on maximum disk seeks
Date Tue, 21 Mar 2017 16:34:51 GMT
Thank you Jan & Jeff for the responses. That was really useful.

Jan - I have one follow-up question. When the data is spread over more than
one SSTable in case of updates as you mentioned, we will need two seeks per
SSTable (one for partition index and another for SSTable itself). I'm
curious to know how partition index is structured internally. I was
assuming it to be a table with <key, disk offset> pairs. In case of an
update to the same key for several times, how it is recorded in the
partition index?

Thanks,
Preetika

On Mon, Mar 20, 2017 at 10:37 PM, <j.kesten@enercast.de> wrote:

> Hi,
>
>
>
> youre right – one seek with hit in the partition key cache and two if not.
>
>
>
> Thats the theory – but two thinge to mention:
>
>
>
> First, you need two seeks per sstable not per entire read. So if you data
> is spread over multiple sstables on disk you obviously need more then two
> reads. Think of often updated partition keys – in combination with memory
> preassure you can easily end up with maaany sstables (ok they will be
> compacted some time in the future).
>
>
>
> Second, there could be fragmentation on disk which leads to seeks during
> sequential reads.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *preetika tyagi <preetikatyagi@gmail.com>
> *Gesendet: *Montag, 20. März 2017 21:18
> *An: *user@cassandra.apache.org
> *Betreff: *question on maximum disk seeks
>
>
>
> I'm trying to understand the maximum number of disk seeks required in a
> read operation in Cassandra. I looked at several online articles including
> this one: https://docs.datastax.com/en/cassandra/3.0/
> cassandra/dml/dmlAboutReads.html
>
> As per my understanding, two disk seeks are required in the worst case.
> One is for reading the partition index and another is to read the actual
> data from the compressed partition. The index of the data in compressed
> partitions is obtained from the compression offset tables (which is stored
> in memory). Am I on the right track here? Will there ever be a case when
> more than 1 disk seek is required to read the data?
>
> Thanks,
>
> Preetika
>
>
>
>
>

Mime
View raw message