incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com>
Subject Re: huge commitlog
Date Sat, 24 Nov 2012 15:14:26 GMT
Hi Cassandra Devs,

After trying to setup the same settings (and importing same data)
to the 3 VMs on the same machine instead of 3 physical machines,
so far I couldn't replicate the exploded-commitlog situation.

On my 4-physical-machine setting, everything seems to be
back to normal (commitlog size is less than the expected max setting)
after restarting the nodes.

This time the size of the commitlog of one node is set as 4G, and the
others are set as 8G.

Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at 8G).
I checked the log, and found some ERROR about network problems,
and some ERROR about "Keys must not be empty".

I suspect that besides the network problems,
the "Keys must not be empty" ERROR may be the main reason why
the commitlog continues growing.
(I've already ensured that the Keys must not be empty in my code,
 so the problem may be raised when syncing internally in cassandra.)

I restarted the 4G node as 8G node. Because there is no huge traffic since
then, I am not sure whether increasing the commitlog size will
solve/reduce this problem or not yet.
I'll keep you posted once the commitlog get expldoed again.

Sincerely,
Hsiao


On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
<hsiao.chuanheng@gmail.com> wrote:
> I have RF = 3. Read/Write consistency has already been set as TWO.
>
> It did seem that the data were not consistent yet.
> (There are some CFs that I expected empty after the operations, but still
>  got some data, and the number of data were decreasing after retrying
> to get all data
>  from that CF)
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tupshin@tupshin.com> wrote:
>> What consistency level are you writing with? If you were writing with ANY,
>> try writing with a higher consistency level.
>>
>> -Tupshin
>>
>> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hsiao.chuanheng@gmail.com>
>> wrote:
>>>
>>> Hi Aaron,
>>>
>>> Thank you very much for the replying.
>>>
>>> The 700 CFs were created in the beginning (before any insertion.)
>>>
>>> I did not do anything with commitlog_archiving.properties, so I guess
>>> I was not using commit log archiving.
>>>
>>> What I did was doing a lot of insertions (and some deletions)
>>> using another 4 machines with 32 processes in total.
>>> (There are 4 nodes in my setting, so there are 8 machines in total)
>>>
>>> I did see huge logs in /var/log/cassandra after such huge amount of
>>> insertions.
>>> Right now I  can't distinguish whether single insertion also cause huge
>>> logs.
>>>
>>> nodetool flush hanged (maybe because of 200G+ commitlog)
>>>
>>> Because these machines are not in production (guaranteed no more
>>> insertion/deletion)
>>> I ended up restarting cassandra one node each time, the commitlog
>>> shrinked back to
>>> 4G. I am doing repair on each node now.
>>>
>>> I'll try to re-import and keep logs when the commitlog increases insanely
>>> again.
>>>
>>> Sincerely,
>>> Hsiao
>>>
>>>
>>> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aaron@thelastpickle.com>
>>> wrote:
>>> > I am wondering whether the huge commitlog size is the expected behavior
>>> > or
>>> > not?
>>> >
>>> > Nope.
>>> >
>>> > Did you notice the large log size during or after the inserts ?
>>> > If after did the size settle ?
>>> > Are you using commit log archiving ? (in commitlog_archiving.properties)
>>> >
>>> > and around 700 mini column family (around 10M in data_file_directories)
>>> >
>>> > Can you describe how you created the 700 CF's ?
>>> >
>>> > and how can we reduce the size of commitlog?
>>> >
>>> > As a work around nodetool flush should checkpoint the log.
>>> >
>>> > Cheers
>>> >
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Developer
>>> > New Zealand
>>> >
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> >
>>> > On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hsiao.chuanheng@gmail.com>
>>> > wrote:
>>> >
>>> > hi Cassandra Developers,
>>> >
>>> > I am experiencing huge commitlog size (200+G) after inserting huge
>>> > amount of data.
>>> > It is a 4-node cluster with RF= 3, and currently each has 200+G commit
>>> > log (so there are around 1T commit log in total)
>>> >
>>> > The setting of commitlog_total_space_in_mb is default.
>>> >
>>> > I am using 1.1.6.
>>> >
>>> > I did not do nodetool cleanup and nodetool flush yet, but
>>> > I did nodetool repair -pr for each column family.
>>> >
>>> > There is 1 huge column family (around 68G in data_file_directories),
>>> > and 18 mid-huge column family (around 1G in data_file_directories)
>>> > and around 700 mini column family (around 10M in data_file_directories)
>>> >
>>> > I am wondering whether the huge commitlog size is the expected behavior
>>> > or
>>> > not?
>>> > and how can we reduce the size of commitlog?
>>> >
>>> > Sincerely,
>>> > Hsiao
>>> >
>>> >

Mime
View raw message