Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CB19FE44 for ; Fri, 22 Mar 2013 15:20:18 +0000 (UTC) Received: (qmail 86384 invoked by uid 500); 22 Mar 2013 15:20:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 86352 invoked by uid 500); 22 Mar 2013 15:20:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 86344 invoked by uid 99); 22 Mar 2013 15:20:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Mar 2013 15:20:15 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [206.225.165.116] (HELO HUB024-nj-1.exch024.serverdata.net) (206.225.165.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Mar 2013 15:20:09 +0000 Received: from MBX024-E1-NJ-6.exch024.domain.local ([10.240.10.56]) by HUB024-NJ-1.exch024.domain.local ([10.240.10.30]) with mapi id 14.02.0318.001; Fri, 22 Mar 2013 08:19:48 -0700 From: Kanwar Sangha To: "user@cassandra.apache.org" Subject: RE: High disk I/O during reads Thread-Topic: High disk I/O during reads Thread-Index: AQHOJrHGF2BpxbWjVUC0BbtRylgKIJixx7hwgACAdwD//4rmcA== Date: Fri, 22 Mar 2013 15:19:47 +0000 Message-ID: <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB644F6@mbx024-e1-nj-6.exch024.domain.local> References: <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB64478@mbx024-e1-nj-6.exch024.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [38.122.186.90] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Sorry...I meant to ask "is your key" spread across multiple sstables ? But = with LCS, your reads should ideally be served from one sstable most of the = times.. -----Original Message----- From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]=20 Sent: 22 March 2013 10:16 To: user@cassandra.apache.org Subject: Re: High disk I/O during reads Did you mean to ask "are 'all' your keys spread across all SSTables"? I am= guessing at your intention. I mean I would very well hope my keys are spread across all sstables or oth= erwise that sstable should not be there as he has no keys in it ;). And I know we had HUGE disk size from the duplication in our sstables on si= ze-tiered compaction....we never ran a major compaction but after we switch= ed to LCS, we went from 300G to some 120G or something like that which was = nice. We only have 300 data point posts / second so not an extreme write l= oad on 6 nodes as well though these posts causes read to check authorizatio= n and such of our system. Dean From: Kanwar Sangha > Reply-To: "user@cassandra.apache.org" > Date: Friday, March 22, 2013 8:38 AM To: "user@cassandra.apache.org" > Subject: RE: High disk I/O during reads Are your Keys spread across all SSTables ? That will cause every sstable re= ad which will increase the I/O. What compaction are you using ? From: zodiak@fifth-aeon.net [mailto:zodiak@fi= fth-aeon.net] On Behalf Of Jon Scarborough Sent: 21 March 2013 23:00 To: user@cassandra.apache.org Subject: High disk I/O during reads Hello, We've had a 5-node C* cluster (version 1.1.0) running for several months. = Up until now we've mostly been writing data, but now we're starting to serv= ice more read traffic. We're seeing far more disk I/O to service these rea= ds than I would have anticipated. The CF being queried consists of chat messages. Each row represents a conv= ersation between two people. Each column represents a message. The column= key is composite, consisting of the message date and a few other bits of i= nformation. The CF is using compression. The query is looking for a maximum of 50 messages between two dates, in rev= erse order. Usually the two dates used as endpoints are 30 days ago and th= e current time. The query in Astyanax looks like this: ColumnList result =3D keyspace.prep= areQuery(CF_CONVERSATION_TEXT_MESSAGE) .setConsistencyLevel(ConsistencyLevel.CL_QUORUM) .getKey(conversationKey) .withColumnRange( textMessageSerializer.makeEndpoint(endDate, Equ= ality.LESS_THAN).toBytes(), textMessageSerializer.makeEndpoint(startDate, E= quality.GREATER_THAN_EQUALS).toBytes(), true, maxMessages) .execute() .getResult(); We're currently servicing around 30 of these queries per second. Here's what the cfstats for the CF look like: Column Family: conversation_text_message SSTable count: 15 Space used (live): 211762982685 Space used (total): 211762982685 Number of Keys (estimate): 330118528 Memtable Columns Count: 68063 Memtable Data Size: 53093938 Memtable Switch Count: 9743 Read Count: 4313344 Read Latency: 118.831 ms. Write Count: 817876950 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 6055 Bloom Filter False Ratio: 0.00260 Bloom Filter Space Used: 686266048 Compacted row minimum size: 87 Compacted row maximum size: 14530764 Compacted row mean size: 1186 On the C* nodes, iostat output like this is typical, and can spike to be mu= ch worse: avg-cpu: %user %nice %system %iowait %steal %idle 1.91 0.00 2.08 30.66 0.50 64.84 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 0.13 0.00 1.07 0 16 xvdb 474.20 13524.53 25.33 202868 380 xvdc 469.87 13455.73 30.40 201836 456 md0 972.13 26980.27 55.73 404704 836 Any thoughts on what could be causing read I/O to the disk from these queri= es? Much thanks! -Jon