Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Kanwar Sangha <kanwar@mavenir.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: High disk I/O during reads
Thread-Topic: High disk I/O during reads
Thread-Index: AQHOJrHGF2BpxbWjVUC0BbtRylgKIJixx7hwgACAdwD//4rmcA==
Date: Fri, 22 Mar 2013 15:19:47 +0000
Message-ID: 
 <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB644F6@mbx024-e1-nj-6.exch024.domain.local>
References: 
 <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB64478@mbx024-e1-nj-6.exch024.domain.local>
 <CD71D10E.24421%Dean.Hiller@nrel.gov>
In-Reply-To: <CD71D10E.24421%Dean.Hiller@nrel.gov>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Sorry...I meant to ask "is your key" spread across multiple sstables ? But =
with LCS, your reads should ideally be served from one sstable most of the =
times..


-----Original Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]=20
Sent: 22 March 2013 10:16
To: user@cassandra.apache.org
Subject: Re: High disk I/O during reads

Did you mean to ask "are 'all' your keys spread across all SSTables"?  I am=
 guessing at your intention.

I mean I would very well hope my keys are spread across all sstables or oth=
erwise that sstable should not be there as he has no keys in it ;).

And I know we had HUGE disk size from the duplication in our sstables on si=
ze-tiered compaction....we never ran a major compaction but after we switch=
ed to LCS, we went from 300G to some 120G or something like that which was =
nice.  We only have 300 data point posts / second so not an extreme write l=
oad on 6 nodes as well though these posts causes read to check authorizatio=
n and such of our system.

Dean

From: Kanwar Sangha <kanwar@mavenir.com<mailto:kanwar@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, March 22, 2013 8:38 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: High disk I/O during reads

Are your Keys spread across all SSTables ? That will cause every sstable re=
ad which will increase the I/O.

What compaction are you using ?

From: zodiak@fifth-aeon.net<mailto:zodiak@fifth-aeon.net> [mailto:zodiak@fi=
fth-aeon.net] On Behalf Of Jon Scarborough
Sent: 21 March 2013 23:00
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: High disk I/O during reads

Hello,

We've had a 5-node C* cluster (version 1.1.0) running for several months.  =
Up until now we've mostly been writing data, but now we're starting to serv=
ice more read traffic.  We're seeing far more disk I/O to service these rea=
ds than I would have anticipated.

The CF being queried consists of chat messages.  Each row represents a conv=
ersation between two people.  Each column represents a message.  The column=
 key is composite, consisting of the message date and a few other bits of i=
nformation.  The CF is using compression.

The query is looking for a maximum of 50 messages between two dates, in rev=
erse order.  Usually the two dates used as endpoints are 30 days ago and th=
e current time.  The query in Astyanax looks like this:

            ColumnList<ConversationTextMessageKey> result =3D keyspace.prep=
areQuery(CF_CONVERSATION_TEXT_MESSAGE)
                    .setConsistencyLevel(ConsistencyLevel.CL_QUORUM)
                    .getKey(conversationKey)
                    .withColumnRange(
                            textMessageSerializer.makeEndpoint(endDate, Equ=
ality.LESS_THAN).toBytes(),
                            textMessageSerializer.makeEndpoint(startDate, E=
quality.GREATER_THAN_EQUALS).toBytes(),
                            true,
                            maxMessages)
                    .execute()
                    .getResult();

We're currently servicing around 30 of these queries per second.

Here's what the cfstats for the CF look like:

        Column Family: conversation_text_message
        SSTable count: 15
        Space used (live): 211762982685
        Space used (total): 211762982685
        Number of Keys (estimate): 330118528
        Memtable Columns Count: 68063
        Memtable Data Size: 53093938
        Memtable Switch Count: 9743
        Read Count: 4313344
        Read Latency: 118.831 ms.
        Write Count: 817876950
        Write Latency: 0.023 ms.
        Pending Tasks: 0
        Bloom Filter False Postives: 6055
        Bloom Filter False Ratio: 0.00260
        Bloom Filter Space Used: 686266048
        Compacted row minimum size: 87
        Compacted row maximum size: 14530764
        Compacted row mean size: 1186

On the C* nodes, iostat output like this is typical, and can spike to be mu=
ch worse:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.91    0.00    2.08   30.66    0.50   64.84

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.13         0.00         1.07          0         16
xvdb            474.20     13524.53        25.33     202868        380
xvdc            469.87     13455.73        30.40     201836        456
md0             972.13     26980.27        55.73     404704        836

Any thoughts on what could be causing read I/O to the disk from these queri=
es?

Much thanks!

-Jon