incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: huge commitlog
Date Mon, 26 Nov 2012 07:43:45 GMT
Can you please create a ticket for this  on https://issues.apache.org/jira/browse/CASSANDRA

Thanks

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/11/2012, at 1:58 PM, Chuan-Heng Hsiao <hsiao.chuanheng@gmail.com> wrote:

> Hi Aaron,
> 
> Thank you very much for replying.
> 
> From the log, it seems the the ERROR happens when trying to flush
> memtable with secondary index.
> (When inserting the data, I set the default value as '' for all
> pre-defined columns.
> it's for programming convenience.)
> 
> The following is the log:
> 
> INFO [OptionalTasks:1] 2012-11-13 14:24:20,650 ColumnFamilyStore.java
> (line 659) Enqueuing flush of
> Memtable-(some_cf).(some_cf)_(some_idx)_idx_1@1216346401(485/8360
> serialized/live bytes, 24 ops)
> ERROR [FlushWriter:2123] 2012-11-13 14:24:20,650
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[FlushWriter:2123,5,main]
> java.lang.AssertionError: Keys must not be empty
>        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
>        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
>        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
>        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:722)
> 
> 
> INFO [FlushWriter:2125] 2012-11-13 14:24:20,651 Memtable.java (line
> 264) Writing Memtable-(some_cf).(some_cf)_(some_idx2)_idx_1@272356994(485/2426
> serialized/live bytes, 24 ops)
> ERROR [FlushWriter:2125] 2012-11-13 14:24:20,652
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[FlushWriter
> :2125,5,main]
> java.lang.AssertionError: Keys must not be empty
>        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
>        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
>        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
>        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:722)
> 
> Sincerely,
> Hsiao
> 
> 
> On Mon, Nov 26, 2012 at 3:52 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> I checked the log, and found some ERROR about network problems,
>> and some ERROR about "Keys must not be empty".
>> 
>> Do you have the full error stack ?
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hsiao.chuanheng@gmail.com>
>> wrote:
>> 
>> Hi Cassandra Devs,
>> 
>> After trying to setup the same settings (and importing same data)
>> to the 3 VMs on the same machine instead of 3 physical machines,
>> so far I couldn't replicate the exploded-commitlog situation.
>> 
>> On my 4-physical-machine setting, everything seems to be
>> back to normal (commitlog size is less than the expected max setting)
>> after restarting the nodes.
>> 
>> This time the size of the commitlog of one node is set as 4G, and the
>> others are set as 8G.
>> 
>> Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at
>> 8G).
>> I checked the log, and found some ERROR about network problems,
>> and some ERROR about "Keys must not be empty".
>> 
>> I suspect that besides the network problems,
>> the "Keys must not be empty" ERROR may be the main reason why
>> the commitlog continues growing.
>> (I've already ensured that the Keys must not be empty in my code,
>> so the problem may be raised when syncing internally in cassandra.)
>> 
>> I restarted the 4G node as 8G node. Because there is no huge traffic since
>> then, I am not sure whether increasing the commitlog size will
>> solve/reduce this problem or not yet.
>> I'll keep you posted once the commitlog get expldoed again.
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
>> <hsiao.chuanheng@gmail.com> wrote:
>> 
>> I have RF = 3. Read/Write consistency has already been set as TWO.
>> 
>> It did seem that the data were not consistent yet.
>> (There are some CFs that I expected empty after the operations, but still
>> got some data, and the number of data were decreasing after retrying
>> to get all data
>> from that CF)
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tupshin@tupshin.com>
>> wrote:
>> 
>> What consistency level are you writing with? If you were writing with ANY,
>> try writing with a higher consistency level.
>> 
>> -Tupshin
>> 
>> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hsiao.chuanheng@gmail.com>
>> wrote:
>> 
>> 
>> Hi Aaron,
>> 
>> Thank you very much for the replying.
>> 
>> The 700 CFs were created in the beginning (before any insertion.)
>> 
>> I did not do anything with commitlog_archiving.properties, so I guess
>> I was not using commit log archiving.
>> 
>> What I did was doing a lot of insertions (and some deletions)
>> using another 4 machines with 32 processes in total.
>> (There are 4 nodes in my setting, so there are 8 machines in total)
>> 
>> I did see huge logs in /var/log/cassandra after such huge amount of
>> insertions.
>> Right now I  can't distinguish whether single insertion also cause huge
>> logs.
>> 
>> nodetool flush hanged (maybe because of 200G+ commitlog)
>> 
>> Because these machines are not in production (guaranteed no more
>> insertion/deletion)
>> I ended up restarting cassandra one node each time, the commitlog
>> shrinked back to
>> 4G. I am doing repair on each node now.
>> 
>> I'll try to re-import and keep logs when the commitlog increases insanely
>> again.
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aaron@thelastpickle.com>
>> wrote:
>> 
>> I am wondering whether the huge commitlog size is the expected behavior
>> or
>> not?
>> 
>> Nope.
>> 
>> Did you notice the large log size during or after the inserts ?
>> If after did the size settle ?
>> Are you using commit log archiving ? (in commitlog_archiving.properties)
>> 
>> and around 700 mini column family (around 10M in data_file_directories)
>> 
>> Can you describe how you created the 700 CF's ?
>> 
>> and how can we reduce the size of commitlog?
>> 
>> As a work around nodetool flush should checkpoint the log.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hsiao.chuanheng@gmail.com>
>> wrote:
>> 
>> hi Cassandra Developers,
>> 
>> I am experiencing huge commitlog size (200+G) after inserting huge
>> amount of data.
>> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
>> log (so there are around 1T commit log in total)
>> 
>> The setting of commitlog_total_space_in_mb is default.
>> 
>> I am using 1.1.6.
>> 
>> I did not do nodetool cleanup and nodetool flush yet, but
>> I did nodetool repair -pr for each column family.
>> 
>> There is 1 huge column family (around 68G in data_file_directories),
>> and 18 mid-huge column family (around 1G in data_file_directories)
>> and around 700 mini column family (around 10M in data_file_directories)
>> 
>> I am wondering whether the huge commitlog size is the expected behavior
>> or
>> not?
>> and how can we reduce the size of commitlog?
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> 


Mime
View raw message