Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 820756FF9 for ; Sun, 5 Jun 2011 11:09:15 +0000 (UTC) Received: (qmail 78453 invoked by uid 500); 5 Jun 2011 11:09:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 78432 invoked by uid 500); 5 Jun 2011 11:09:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 78424 invoked by uid 99); 5 Jun 2011 11:09:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Jun 2011 11:09:13 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.149.238] (HELO na3sys009aog115.obsmtp.com) (74.125.149.238) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Jun 2011 11:09:07 +0000 Received: from mail-qw0-f45.google.com ([209.85.216.45]) (using TLSv1) by na3sys009aob115.postini.com ([74.125.148.12]) with SMTP ID DSNKTetjvkKQV279wSvczNcl50L3KX9uYFR2@postini.com; Sun, 05 Jun 2011 04:08:47 PDT Received: by mail-qw0-f45.google.com with SMTP id 8so1606632qwj.4 for ; Sun, 05 Jun 2011 04:08:46 -0700 (PDT) Received: by 10.224.211.194 with SMTP id gp2mr1170007qab.163.1307272125042; Sun, 05 Jun 2011 04:08:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.10.196 with HTTP; Sun, 5 Jun 2011 04:08:15 -0700 (PDT) In-Reply-To: References: From: Mario Micklisch Date: Sun, 5 Jun 2011 13:08:15 +0200 Message-ID: Subject: Re: problems with many columns on a row To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf300fb1fba34fbb04a4f502e7 --20cf300fb1fba34fbb04a4f502e7 Content-Type: text/plain; charset=ISO-8859-1 I tracked down the timestamp submission and everything was fine within the PHP Libraries. The thrift php extension however seems to have an overflow, because it was now setting now timestamps with also negative values ( -1242277493 ). I disabled the php extension and as a result I now got correct microsecond timestamps: 1307270937122897 I was using the latest version from http://www.apache.org/dist/thrift/0.6.1/to build the extension without using any special parameters (just ./configure --enable-gen-php=yes and make install). Downloaded and re-compiled it, without any change. I can also see no compile parameters to set which might help. Any advise where to go from here? Thanks so far! Mario 2011/6/5 Mario Micklisch > Thanks for the feedback Aaron! > > The schema of the CF is default, I just defined the name and the rest is > default, have a look: > > Keyspace: TestKS > Read Count: 65 > Read Latency: 657.8047076923076 ms. > Write Count: 10756 > Write Latency: 0.03237039791744143 ms. > Pending Tasks: 0 > Column Family: CFTest > SSTable count: 1 > Space used (live): 25671740 > Space used (total): 51349233 > Memtable Columns Count: 54 > Memtable Data Size: 21375 > Memtable Switch Count: 1 > Read Count: 65 > Read Latency: 657.805 ms. > Write Count: 10756 > Write Latency: 0.032 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 11 > Key cache hit rate: 6.777150522609133E-4 > Row cache: disabled > Compacted row minimum size: 125 > Compacted row maximum size: 654949 > Compacted row mean size: 287 > > > I am using phpcassa in the latest (0.8. compatible) version. I was also > wondering about the timestamp details the CLI has shown, on my last test run > I opened the cassandra-cli in the terminal and did some get requests there > to see how the data is changing while filling in my random test data. > > The timestamp was something around 87.000.000 at first and then grow > to 2.147.442.124 (1.464.439.894 in the earlier example) for the tested row, > it looked suspicious but since the data was not in clean ascii I was not so > sure about that. > > I will check that now. > > What about the compact? Is this really because of the OS Volume beeing > smaller than the data volume? There is plenty of space on the data volume, > how can I make sure it is not using using the OS Volume for compaction? > > Cheers, > Mario > > > 2011/6/5 aaron morton > >> It is rarely a good idea to let the data disk get to far over 50% >> utilisation. With so little free space the compaction process will have >> trouble running http://wiki.apache.org/cassandra/MemtableSSTable >> >> As you are on the RC1 I would just drop the data and start again. If you >> need to keep it you can use multiple data directories as specified in the >> cassandra.yaml file. See the data_file_directories setting. (the >> recommendation is to use 1 data directory) >> >> The exception looks pretty odd, something wacky with the column family >> definition. Have you been changing the schema ? >> >> For the delete problem, something looks odd about the timestamps you are >> using. How was the data inserted ? >> >> This is your data sample... >> >> [default@TestKS] get >> CFTest['44656661756c747c65333332356231342d373937392d313165302d613663382d3132333133633033336163347c5461626c65737c5765625369746573']; >> => (column=count, value=3331353030, timestamp=1464439894) >> => (column=split, value=3334, timestamp=1464439894) >> >> Time stamps are normally microseconds since the unix epoch >> http://wiki.apache.org/cassandra/DataModel?highlight=%28timestamp%29 >> >> This is what the CLI will use, e.g. >> >> [default@dev] set data[ascii('foo')]['bar'] = 'baz'; >> Value inserted. >> [default@dev] get data['foo']; >> >> => (column=bar, value=62617a, timestamp=1307248484615000) >> Returned 1 results. >> [default@dev] del data['foo']; >> row removed. >> [default@dev] get data['foo']; >> Returned 0 results. >> [default@dev] >> >> >> The higher numbers created by the client should still work, but I would >> look into this first. >> >> Cheers >> >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 5 Jun 2011, at 10:09, Mario Micklisch wrote: >> >> Yes, checked the log file, no errors there. >> >> With debug logging it confirms to receive the write too and it is also in >> the commitlog. >> >> DEBUG 22:00:14,057 insert writing local RowMutation(keyspace='TestKS', >> key='44656661756c747c65333332356231342d373937392d313165302d613663382d3132333133633033336163347c5461626c65737c5765625369746573', >> modifications=[CFTest]) >> DEBUG 22:00:14,057 applying mutation of row >> 44656661756c747c65333332356231342d373937392d313165302d613663382d3132333133633033336163347c5461626c65737c5765625369746573 >> >> >> But doing compact with the nodetool triggered an error: >> >> ERROR [CompactionExecutor:8] 2011-06-04 21:47:44,021 >> CompactionManager.java (line 510) insufficient space to compact even the two >> smallest files, aborting >> ERROR [CompactionExecutor:8] 2011-06-04 21:47:44,024 >> CompactionManager.java (line 510) insufficient space to compact even the two >> smallest files, aborting >> >> The data folder has currently a size of about 1GB, there are 150GB free >> disk space on the volume where I pointed all cassandra directories but only >> 3.5GB free disk space on the operating system disk. >> >> Could this be the reason? How can I set the environment variables to let >> it only use the dedicated volume? >> >> >> Trying to use sstable2json did not work (throws an exception, am I using >> the wrong parameter?): >> >> # sstable2json ./CFTest-g-40-Data.db >> log4j:WARN No appenders could be found for logger >> (org.apache.cassandra.config.DatabaseDescriptor). >> log4j:WARN Please initialize the log4j system properly. >> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for >> more info. >> { >> Exception in thread "main" java.lang.NullPointerException >> at org.apache.cassandra.db.ColumnFamily.(ColumnFamily.java:82) >> at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:70) >> at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:142) >> at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:90) >> at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:74) >> at >> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:179) >> at >> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144) >> at >> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136) >> at >> org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:313) >> at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:344) >> at >> org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:357) >> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:415) >> >> >> >> Cheers, >> Mario >> >> 2011/6/4 Jonathan Ellis >> >>> Did you check the server log for errors? >>> >>> See if the problem persists after running nodetool compact. If it >>> does, use sstable2json to export the row in question. >>> >> >> > --20cf300fb1fba34fbb04a4f502e7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

I tracked down the timestamp submission and eve= rything was fine within the PHP Libraries.

The thrift php extension h= owever seems to have an overflow, because it was now setting now timestamps= with also negative values (=A0-1242277493 ). I disabled the php extension = and as a result I now got correct microsecond timestamps:=A0130727093712289= 7

I was using the latest version from=A0http://www.apache.org/dist/thrift/0.6.1/ to build the ex= tension without using any special parameters (just=A0./configure --enable-gen-php=3Dyes an= d make install).

Downloaded and re-compiled it, without any change. I can also =A0see no = compile parameters to set which might help.


Any advise whe= re to go from here?


Thanks so far!

=A0Mario



2011/6/5 Mario Micklisch <mario.micklisch@hpm-kommunikation.de>
Thanks for the feedback Aaron!

The schema of the CF is d= efault, I just defined the name and the rest is default, have a look:
=

Keyspace: TestKS
Read Count: 65
Read Latency: 657.8047076= 923076 ms.
Write Cou= nt: 10756
Write Late= ncy: 0.03237039791744143 ms.
Pending Tasks: 0
Column Family: CFTest
=
SSTable count: 1
Space used (live): 25671= 740
Space used (tot= al): 51349233
Memta= ble Columns Count: 54
Memtable Data Size: 2137= 5
Memtable Switch C= ount: 1
Read Count:= 65
Read Latency: 657.805 ms= .
Write Count: 1075= 6
Write Latency: 0.= 032 ms.
Pending Tasks: 0
Key cache capacity: 200000=
Key cache size: 11=
Key cache hit rate: 6.77= 7150522609133E-4
Ro= w cache: disabled
C= ompacted row minimum size: 125
Compacted row maximum si= ze: 654949
Compacte= d row mean size: 287


I am using phpcassa in the latest (0.8. compatibl= e) version. I was also wondering about the timestamp details the CLI has sh= own, on my last test run I opened the cassandra-cli in the terminal and did= some get requests there to see how the data is changing while filling in m= y random test data.

The timestamp was something around 87.000.000 at first = and then grow to=A02.147.442.124 (1.464.439.894 in the earlier example)=A0f= or the tested row, it looked suspicious but since the data was not in clean= ascii I was not so sure about that.

I will check that now.

What ab= out the compact? Is this really because of the OS Volume beeing smaller tha= n the data volume? There is plenty of space on the data volume, how can I m= ake sure it is not using using the OS Volume for compaction?

Cheers,
=A0Mario


2011/6/5 aaron morton <aaron@thelastpickl= e.com>
It is rarely a good idea to let the data disk = get to far over 50% utilisation. With so little free space the compaction p= rocess will have trouble running=A0http://wiki.apache.org/cassandra/Mem= tableSSTable

As you are on the RC1 I would just drop the data and start a= gain. If you need to keep it you can use multiple data directories as speci= fied in the cassandra.yaml file. See the data_file_directories setting. (th= e recommendation is to use 1 data directory)=A0

The exception looks pretty odd, something wacky with th= e column family definition. Have you been changing the schema ?=A0

For the delete problem, something looks odd about the time= stamps you are using. =A0How was the data inserted ?=A0

This is your data sample...

[default@TestKS] get CFTest['44656661756c747c65333332356231342d= 373937392d313165302d613663382d3132333133633033336163347c5461626c65737c57656= 25369746573'];
=3D> (column=3Dcount, value=3D3331353030, timestamp=3D1464439894)
=3D> (column=3Dsplit, value=3D3334, timestamp=3D1464439894)
=A0
Time stamps are normally microseconds sinc= e the unix epoch=A0http://wiki.apache.org/cassandra/D= ataModel?highlight=3D%28timestamp%29

This is what the CLI will use, e.g.=A0

[default@dev] set data[ascii('foo')]['bar'] = =3D 'baz';
Value inserted.
[default@= dev] get data['foo']; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=3D> (column=3Dbar, value=3D62617a, timestamp=3D1307248484615000)
Returned 1 results.
[default@dev] del data['foo'= ];
row removed.
[default@dev] get data['foo']; = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
Returned 0 results.
[default@dev]=A0


The higher numbers created by the client should stil= l work, but I would look into this first.=A0

Cheer= s


-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 5 Jun 2011, at 10:09, Mario Micklisch wrote:

Yes, checked the log file, no errors there.

<= /div>
With debug logging it confirms to receive the write too and it is= also in the commitlog.

DEBUG 22:00:14,057 insert writing local RowMutatio= n(keyspace=3D'TestKS', key=3D'44656661756c747c65333332356231342= d373937392d313165302d613663382d3132333133633033336163347c5461626c65737c5765= 625369746573', modifications=3D[CFTest])
DEBUG 22:00:14,057 applying mutation of row 44656661756c747c6533333235= 6231342d373937392d313165302d613663382d3132333133633033336163347c5461626c657= 37c5765625369746573


But doing compa= ct with the nodetool triggered an error:

ERROR [CompactionExecutor:8] 2011-06-04 21:47:44,0= 21 CompactionManager.java (line 510) insufficient space to compact even the= two smallest files, aborting
ERROR [CompactionExecutor:8] 2011-0= 6-04 21:47:44,024 CompactionManager.java (line 510) insufficient space to c= ompact even the two smallest files, aborting =A0=A0

The data folder has currently a size of about 1GB= , there are 150GB free disk space on the volume where I pointed all cassand= ra directories but only 3.5GB free disk space on the operating system disk.=

Could this be the reason? How can I set the environment= variables to let it only use the dedicated volume?


Trying to use sstable2json did not work (throws an excepti= on, am I using the wrong parameter?):

# sstable2json ./CFTest-g-40-Data.db =A0
log4j:WARN No appenders could be found for logger (org.apache.cassandra.co= nfig.DatabaseDescriptor).
log4j:WARN Please initialize the log4j = system properly.
{
Exception in thread "= main" java.lang.NullPointerException
at org.apache.cassandra.d= b.ColumnFamily.<init>(ColumnFamily.java:82)
at org.apache.cassandra.db.ColumnFamily.creat= e(ColumnFamily.java:70)
at org.apache.cassandra.i= o.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java= :142)
at org.apache.= cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIt= erator.java:90)
at org.apache.cassandra.i= o.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java= :74)
at org.apache.c= assandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.= java:179)
at org.apache.cassandra.i= o.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144)<= /div>
at org.apache.cassan= dra.io.sstable.SSTableScanner.next(SSTableScanner.java:136)
at org.apache.cassandra.t= ools.SSTableExport.export(SSTableExport.java:313)
at org.apache.cassandra.tools.SSTableExport.e= xport(SSTableExport.java:344)
at org.apache.cassandra.t= ools.SSTableExport.export(SSTableExport.java:357)
at org.apache.cassandra.tools.SSTableExport.m= ain(SSTableExport.java:415)



Cheers,
=A0Mario

2011/6/4 J= onathan Ellis <jbellis@gmail.com>
Did you check the server log for errors?

See if the problem persists after running nodetool compact. If it
does, use sstable2json to export the row in question.



--20cf300fb1fba34fbb04a4f502e7--