Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDF2818EA3 for ; Mon, 10 Aug 2015 21:21:14 +0000 (UTC) Received: (qmail 60227 invoked by uid 500); 10 Aug 2015 21:20:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60186 invoked by uid 500); 10 Aug 2015 21:20:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60176 invoked by uid 99); 10 Aug 2015 21:20:59 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Aug 2015 21:20:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 059AD1A9C6D for ; Mon, 10 Aug 2015 21:20:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.001 X-Spam-Level: **** X-Spam-Status: No, score=4.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id I9zQeRXGzqOm for ; Mon, 10 Aug 2015 21:20:51 +0000 (UTC) Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 49F35205B3 for ; Mon, 10 Aug 2015 21:20:51 +0000 (UTC) Received: by pdrh1 with SMTP id h1so57422016pdr.0 for ; Mon, 10 Aug 2015 14:20:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=iFmXSg+K6MponeqAOnguDldKkORJB3fIIIWA7XEq+68=; b=DSaBtKEGKzHGkQBrjdmapn6TBThuDte96LTro4TAl9uL2vp9pPpTAhomeIwLMW4Ksy RBipRL8aF0EY2yB9V8NtoA5XA7+VuCjt/uLXBmbf1z+4OhoGLWNbRcrF9J51viFn3atD g6eWuGGgEYIx28rE1whvS/IoSTOi0IGiz5H7oRZVW6VF6NcpBHpBtVb0hJMmjZKUkaHm iQrIYvHJgweOcf928K0HWn+cK3588CtHIhEv1n6HNubc679awXSkRNrzp4jc65ZZDama rSowP9nKZ5VP1MiVVqcyInhD72w0eocqRV1QTqHbH27hD6BagylRuTFGYRAbdDrGG4Oe bqWg== X-Gm-Message-State: ALoCoQlhZo0IasPDknXFZcuX/nmcAcB2rq6kpKjIgCnJNAMPT4/cCAjw/ixpFixvxBAPJobQOcTK X-Received: by 10.70.50.165 with SMTP id d5mr49250694pdo.93.1439241650839; Mon, 10 Aug 2015 14:20:50 -0700 (PDT) Received: from [192.168.0.103] (c-73-202-71-33.hsd1.ca.comcast.net. [73.202.71.33]) by smtp.gmail.com with ESMTPSA id ts1sm21001801pbc.74.2015.08.10.14.20.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Aug 2015 14:20:50 -0700 (PDT) From: rock zhang Content-Type: multipart/alternative; boundary="Apple-Mail=_3F80AB5D-C08E-4A9E-9FB7-16A7E0826CAD" Message-Id: <2B143109-FABB-4583-9F19-443B510E96BE@alohar.com> Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: OOM when Adding host Date: Mon, 10 Aug 2015 14:21:59 -0700 References: <796B4E63-B95B-477D-99F2-2FC66FB252C4@alohar.com> <54804B16-1BF4-4591-BF6A-D4A3BA28D33F@alohar.com> To: "user@cassandra.apache.org" In-Reply-To: <54804B16-1BF4-4591-BF6A-D4A3BA28D33F@alohar.com> X-Mailer: Apple Mail (2.1878.6) --Apple-Mail=_3F80AB5D-C08E-4A9E-9FB7-16A7E0826CAD Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii I logged the open files every 10 mins, last record is :=20 lsof -p $cassadnraPID | wc -l 74728 lsof |wc-l 5887913 # this is a very large number, don't know why. After OOM the open file numbers back to few hundreds (lsof | wc -l ).=20 On Aug 10, 2015, at 9:59 AM, rock zhang wrote: > My Cassandra version is 2.1.4. >=20 > Thanks > Rock=20 >=20 > On Aug 10, 2015, at 9:52 AM, rock zhang wrote: >=20 >> Hi All, >>=20 >> Currently i have three hosts. The data is not balanced, one is 79G, = another two have 300GB. When I adding a new host, firstly I got "too = many open files" error, then i changed file open limit from 100,000 to = 1, 000, 000. Then I got OOM error. >>=20 >> Should I change the limits to 20,0000 instead of 1M? My memory is = 33G, i am using EC2 c2*2xlarge. Ideally even if the data is large, just = slower, should not OOM, don't understand why . >>=20 >> I actually got this error pretty often. I guess the reason is because = my data is pretty large? If cassandra try to split the data evenly on = all host, then Cassandra need to copy around 200GB to the new host.=20 >>=20 >> =46rom my experience, An alternative way to solve this is add new = host as seed, do not use "Add host", then data would be move, so so OOM. = But not sure data will be lost or cannot be located.=20 >>=20 >> Thanks >> Rock=20 >>=20 >=20 --Apple-Mail=_3F80AB5D-C08E-4A9E-9FB7-16A7E0826CAD Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
I logged the = open files every 10 mins, last record is : 

lsof -p $cassadnraPID | wc -l

74728

lsof |wc-l
5887913       # this is a very = large number, don't know why.

After OOM the open file = numbers back to few hundreds (lsof | wc -l ). 




On Aug 10, = 2015, at 9:59 AM, rock zhang <rock@alohar.com> wrote:

My = Cassandra version is 2.1.4.

Thanks
Rock

On Aug 10, = 2015, at 9:52 AM, rock zhang <rock@alohar.com> = wrote:

Hi All,

Currently i have = three hosts. The data is not balanced, one is 79G, another two have = 300GB. When I adding a new host, firstly I got "too many open files" = error, then i changed file open limit from 100,000 to 1, 000, 000. Then = I got OOM error.

Should I change the limits to 20,0000 instead of = 1M?  My memory is 33G, i am using EC2 c2*2xlarge.  Ideally = even if the data is large, just slower, should not OOM, don't understand = why .

I actually got this error pretty often. I guess the reason = is because my data is pretty large?  If cassandra try to split the = data evenly on all host, then Cassandra need to copy around 200GB to the = new host.

=46rom my experience, An alternative way to solve this = is add new host as seed, do not use "Add host", then data would be move, = so so OOM. But not sure data will be lost or cannot be located. =

Thanks
Rock =



= --Apple-Mail=_3F80AB5D-C08E-4A9E-9FB7-16A7E0826CAD--