Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F12571021E for ; Mon, 17 Feb 2014 04:13:56 +0000 (UTC) Received: (qmail 36284 invoked by uid 500); 17 Feb 2014 04:13:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 35736 invoked by uid 500); 17 Feb 2014 04:13:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 35728 invoked by uid 99); 17 Feb 2014 04:13:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 04:13:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of thunder.stumpges@gmail.com designates 209.85.220.42 as permitted sender) Received: from [209.85.220.42] (HELO mail-pa0-f42.google.com) (209.85.220.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 04:13:45 +0000 Received: by mail-pa0-f42.google.com with SMTP id kl14so14878178pab.1 for ; Sun, 16 Feb 2014 20:13:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:from:subject:date:to; bh=wDkkKb+rSPld65l4cc8llX0uovpQe9BkjSvhrbZkf04=; b=bC7EaI7jyyyGJkKT5r193jNPzg4KHrSHJixg+saaHy/pytScFSPojbBugWaeH4eARP XwzikXVLUFCr/f4dcivlH5N7H0DirosrjEDgCdo5sAvacRpT9PqasGXqhn50RhXqXKL+ usDWZYL2FqObEh8Gfi2TnTBxETJexEBJ23sDyILSVS5qo0BBl9avY6XpjU4Vsi3xx8jY TzNBHml6Veir7iKp0vmky3Ka5G0BHikskxMyJgrL9QKEnGp/Pd78h36bKtt//WBB0F8p cRTUwPaUvnw2SCs8OLdBzzDgWrEUhgzY7W05LzYrb0ycCv2abAaOifH6trqM8f2RE93p Tr7g== X-Received: by 10.68.204.231 with SMTP id lb7mr10006305pbc.30.1392610404071; Sun, 16 Feb 2014 20:13:24 -0800 (PST) Received: from [192.168.37.145] (ip72-199-220-200.sd.sd.cox.net. [72.199.220.200]) by mx.google.com with ESMTPSA id iq10sm40936059pbc.14.2014.02.16.20.13.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 16 Feb 2014 20:13:22 -0800 (PST) References: <10345B9E-DA6E-424B-8242-40F8DE24B78E@me.com> In-Reply-To: Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary=Apple-Mail-15460A3F-CB57-42CF-B92B-684368846386 Message-Id: Cc: "user@cassandra.apache.org" X-Mailer: iPhone Mail (11B554a) From: Thunder Stumpges Subject: Re: Where to I start to get to the bottom of this WriteTimeout issue? Date: Sun, 16 Feb 2014 20:13:20 -0800 To: "user@cassandra.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-15460A3F-CB57-42CF-B92B-684368846386 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable If you are looking for write throughput and running on a VM you could likely= have IO issues with your virtual disks.. Best practices are to put the writ= e ahead log on a separate disk from the data folder(s). Not sure if you have= done this or what physical setup you have under the VM but I would also exa= mine your IO while you are doing this. Is there other load on the system eit= her read or write while this is happening? -Thunder > On Feb 16, 2014, at 8:04 PM, Erick Ramirez wrote: >=20 > Jacob, >=20 > You are right in that increasing the timeout to 20,000ms (20 seconds) is a= real concern as it just hides an underlying issue with your environment. Wi= thout additional information, I was suspecting that this could be due to the= environment not being optimised. >=20 > These write timeouts can occur when the systems are under load or low on r= esources. My questioning around memory is leading to the fact that your syst= em(s) may possibly be under load due to GC which points to JVM running out o= f memory. >=20 > Have a look at the logs as they will give you clues as to what is happenin= g, and possibly the cause of the issue. And keep us posted. Thanks! >=20 > Cheers, > Erick >=20 >=20 >=20 >> On Mon, Feb 17, 2014 at 1:41 PM, Jacob Rhoden wrote= : >> Hi Erick, >>=20 >>> On 17 Feb 2014, at 1:19 pm, Erick Ramirez wrote: >>> Are you able to post log snippets around the time that the timeouts occu= r? >>>=20 >>> I have a suspicion you may be running out of heap memory and might need t= o tune your environment. The INFO entries in the log should indicate this. >>=20 >> Im kicking off the load and not watching it so I don=E2=80=99t have a tim= estamp to see where it occurred. After some mucking around I worked out that= adding an extra zero to the following parameter on both nodes makes the pro= blem has gone away: >>=20 >> write_request_timeout_in_ms: 20000 >>=20 >> Whatever that parameter exactly controls, Im pretty sure I don=E2=80=99t w= ant to keep a 20s write timeout :D but it allows my bulk loads to run for th= e time being. >>=20 >> The nodes are running on some test VM=E2=80=99s with xmx/xms set at 1Gb.= So are you assuming that bulk counter row adding/incrementing can cause mem= ory issues? How much memory do you need to allocate before this category of p= roblem would disappear? >>=20 >> Thanks, >> Jacob >=20 --Apple-Mail-15460A3F-CB57-42CF-B92B-684368846386 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

If you are looking for write throughpu= t and running on a VM you could likely have IO issues with your virtual disk= s.. Best practices are to put the write ahead log on a separate disk from th= e data folder(s). Not sure if you have done this or what physical setup you h= ave under the VM but I would also examine your IO while you are doing this. I= s there other load on the system either read or write while this is happenin= g?

-Thunder

On Feb 16,= 2014, at 8:04 PM, Erick Ramirez <erick@ramirez.com.au> wrote:

=
Jacob,

You are right in that increa= sing the timeout to 20,000ms (20 seconds) is a real concern as it just hides= an underlying issue with your environment. Without additional information, I= was suspecting that this could be due to the environment not being optimise= d.

These write timeouts can occur when the systems are unde= r load or low on resources. My questioning around memory is leading to the f= act that your system(s) may possibly be under load due to GC which points to= JVM running out of memory.

Have a look at the logs as they will give you clues as t= o what is happening, and possibly the cause of the issue. And keep us posted= . Thanks!

Cheers,
Erick

On Mon, Feb 17, 2014 at 1:41 PM, Jacob Rh= oden <jacob.rhoden@me.com> wrote:

Hi Erick,

O= n 17 Feb 2014, at 1:19 pm, Erick Ramirez <erick@ramirez.com.au> wrote:

Are you able to post log snippets around the time that= the timeouts occur?

I have a suspicion you may be r= unning out of heap memory and might need to tune your environment. The INFO e= ntries in the log should indicate this.

Im kicking off the load and no= t watching it so I don=E2=80=99t have a timestamp to see where it occurred. A= fter some mucking around I worked out that adding an extra zero to the follo= wing parameter on both nodes makes the problem has gone away:

write_request_timeout_in_ms: 20000

Whateve= r that parameter exactly controls, Im pretty sure I don=E2=80=99t want to ke= ep a 20s write timeout :D but it allows my bulk loads to run for the time be= ing.

The nodes are running on some test VM=E2=80=99s wi= th xmx/xms set at 1Gb. So are you assuming that bulk counter row adding/incr= ementing can cause memory issues? How much memory do you need to allocate be= fore this category of problem would disappear?

Thanks,
Jacob

= --Apple-Mail-15460A3F-CB57-42CF-B92B-684368846386--