Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 636 invoked from network); 28 Apr 2010 17:38:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Apr 2010 17:38:06 -0000 Received: (qmail 98024 invoked by uid 500); 28 Apr 2010 17:38:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 98008 invoked by uid 500); 28 Apr 2010 17:38:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98000 invoked by uid 99); 28 Apr 2010 17:38:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 17:38:04 +0000 X-ASF-Spam-Status: No, hits=-1.2 required=10.0 tests=AWL,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [217.149.52.2] (HELO web2.futuron.fi) (217.149.52.2) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 17:37:58 +0000 Received: from cs179221.pp.htv.fi ([213.243.179.221]:37874 helo=usva.local) by web2.futuron.fi with esmtpa (Exim 4.69) (envelope-from ) id 1O7BCV-0003EJ-Vo for user@cassandra.apache.org; Wed, 28 Apr 2010 20:37:32 +0300 Message-ID: <4BD8725C.30504@androidconsulting.com> Date: Wed, 28 Apr 2010 20:37:32 +0300 From: =?ISO-8859-1?Q?Jussi_P=3F=F6ri?= Reply-To: jussi@androidconsulting.com User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Inserting files to Cassandra timeouts References: <4BD84522.9030706@androidconsulting.com> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - web2.futuron.fi X-AntiAbuse: Original Domain - cassandra.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - androidconsulting.com I was thinking this too, but I think that the overall insert amount is not that big. Data is basically map data, and the files are map tiles, which I can easily make smaller. We are currently using this data from multiple nodes(GRID), but we want to get rid off the files system hassle(basically samba mounts). Read r always done per file(column). This is why I think that Cassandra would be good. At least the read performance is more that good for us. -jussi Schubert Zhang wrote: > I think your file (as cassandra column value) is too large. > And I also think Cassandra is not good at store files. > > On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?�ri > > wrote: > > new try, previous went to wrong place... > > Hi all, > > i'm trying to run a scenario of adding files from specific folder > to cassandra. Now I have 64 files(about 15-20 MB per file) and > overall of 1GB of data. > I'm able to insert a round 40 files, but after that the cassandra > goes to some GC loop and I finally get an timeout to the client. > It is not going to OOM, but it just jams. > > Here is what I had last marks in log file: > NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line > 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 > used; max is 4108386304 > INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java > (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving > 1435582832 used; max is 4108386304 > INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java > (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving > 1679795336 used; max is 4108386304 > INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java > (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving > 1589308456 used; max is 4108386304 > > > > I think that I must have something wrong in my configurations or > in how I use cassandra, because here people are inserting 10 times > more stuff and it works. > > Column family I using: > > Basically inserting with key name is "Folder_name" and column name > is "file name" and value is the file content. > I tried with Hector(mainly) and directly using thrift(insert and > batch_mutate). > > In my case, the data does not need to readable immediately after > insert, but I don't know it that helps in anyway. > > > My environment : > mac and/or linux, tested in both > java 1.6.0_17 > Cassandra 0.6.1 > > > > 60000 > 32 > 512 > 32 > 32 > 8 > 64 > 64 > 256 > 0.1 > 60 > 8 > 32 > batch > > 1.0 > 500 > > JVM_OPTS=" \ > -server \ > -Xms3G \ > -Xmx3G \ > -XX:PermSize=512m \ > -XX:MaxPermSize=800m \ > -XX:MaxNewSize=256m \ > -XX:NewSize=128m \ > -XX:TargetSurvivorRatio=90 \ > -XX:+AggressiveOpts \ > -XX:+UseParNewGC \ > -XX:+UseConcMarkSweepGC \ > -XX:+CMSParallelRemarkEnabled \ > -XX:+HeapDumpOnOutOfMemoryError \ > -XX:SurvivorRatio=128 \ > -XX:MaxTenuringThreshold=0 \ > -XX:+DisableExplicitGC \ > -Dcom.sun.management.jmxremote.port=8080 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false" > >