cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roni Balthazar <ronibaltha...@gmail.com>
Subject Re: Many pending compactions
Date Wed, 18 Feb 2015 15:20:42 GMT
Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
"repair -pr" on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a "nodetool cleanup".

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar




On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam <ptrstpppp@gmail.com> wrote:
> 1) we tried to run repairs but they usually does not succeed. But we had
> Leveled compaction before. Last week we ALTER tables to STCS, because guys
> from DataStax suggest us that we should not use Leveled and alter tables in
> STCS, because we don't have SSD. After this change we did not run any
> repair. Anyway I don't think it will change anything in SSTable count - if I
> am wrong please give me an information
>
> 2) I did this. My tables are 99% write only. It is audit system
>
> 3) Yes I am using default values
>
> 4) In both operations I am using LOCAL_QUORUM.
>
> I am almost sure that READ timeout happens because of too much SSTables.
> Anyway firstly I would like to fix to many pending compactions. I still
> don't know how to speed up them.
>
>
> On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar <ronibalthazar@gmail.com>
> wrote:
>>
>> Are you running repairs within gc_grace_seconds? (default is 10 days)
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>
>> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>> that you do not read often.
>>
>> Are you using default values for the properties
>> min_compaction_threshold(4) and max_compaction_threshold(32)?
>>
>> Which Consistency Level are you using for reading operations? Check if
>> you are not reading from DC_B due to your Replication Factor and CL.
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>>
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam <ptrstpppp@gmail.com> wrote:
>> > I don't have problems with DC_B (replica) only in DC_A(my system write
>> > only
>> > to it) I have read timeouts.
>> >
>> > I checked in OpsCenter SSTable count  and I have:
>> > 1) in DC_A  same +-10% for last week, a small increase for last 24h (it
>> > is
>> > more than 15000-20000 SSTables depends on node)
>> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
>> > prognostics.
>> > Now I have less then 1000 SSTables
>> >
>> > What did you measure during system optimizations? Or do you have an idea
>> > what more should I check?
>> > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
>> > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
>> > spikes
>> > 3) system RAM usage is almost full
>> > 4) In Total Bytes Compacted most most lines are below 3MB/s. For total
>> > DC_A
>> > it is less than 10MB/s, in DC_B it looks much better (avg is like
>> > 17MB/s)
>> >
>> > something else?
>> >
>> >
>> >
>> > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
>> > <ronibalthazar@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> You can check if the number of SSTables is decreasing. Look for the
>> >> "SSTable count" information of your tables using "nodetool cfstats".
>> >> The compaction history can be viewed using "nodetool
>> >> compactionhistory".
>> >>
>> >> About the timeouts, check this out:
>> >>
>> >> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
>> >> Also try to run "nodetool tpstats" to see the threads statistics. It
>> >> can lead you to know if you are having performance problems. If you
>> >> are having too many pending tasks or dropped messages, maybe will you
>> >> need to tune your system (eg: driver's timeout, concurrent reads and
>> >> so on)
>> >>
>> >> Regards,
>> >>
>> >> Roni Balthazar
>> >>
>> >> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam <ptrstpppp@gmail.com> wrote:
>> >> > Hi,
>> >> > Thanks for your "tip" it looks that something changed - I still don't
>> >> > know
>> >> > if it is ok.
>> >> >
>> >> > My nodes started to do more compaction, but it looks that some
>> >> > compactions
>> >> > are really slow.
>> >> > In IO we have idle, CPU is quite ok (30%-40%). We set
>> >> > compactionthrouput
>> >> > to
>> >> > 999, but I do not see difference.
>> >> >
>> >> > Can we check something more? Or do you have any method to monitor
>> >> > progress
>> >> > with small files?
>> >> >
>> >> > Regards
>> >> >
>> >> > On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar
>> >> > <ronibalthazar@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> HI,
>> >> >>
>> >> >> Yes... I had the same issue and setting cold_reads_to_omit to 0.0
>> >> >> was
>> >> >> the solution...
>> >> >> The number of SSTables decreased from many thousands to a number
>> >> >> below
>> >> >> a hundred and the SSTables are now much bigger with several
>> >> >> gigabytes
>> >> >> (most of them).
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Roni Balthazar
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam <ptrstpppp@gmail.com>
>> >> >> wrote:
>> >> >> > After some diagnostic ( we didn't set yet cold_reads_to_omit
).
>> >> >> > Compaction
>> >> >> > are running but VERY slow with "idle" IO.
>> >> >> >
>> >> >> > We had a lot of "Data files" in Cassandra. In DC_A it is about
>> >> >> > ~120000
>> >> >> > (only
>> >> >> > xxx-Data.db) in DC_B has only ~4000.
>> >> >> >
>> >> >> > I don't know if this change anything but:
>> >> >> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few
a really
>> >> >> > big
>> >> >> > ones,
>> >> >> > but most is really small (almost 10000 files are less then
100mb).
>> >> >> > 2) in DC_B avg size of Data.db is much bigger ~260mb.
>> >> >> >
>> >> >> > Do you think that above flag will help us?
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam <ptrstpppp@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> I set setcompactionthroughput 999 permanently and it doesn't
>> >> >> >> change
>> >> >> >> anything. IO is still same. CPU is idle.
>> >> >> >>
>> >> >> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar
>> >> >> >> <ronibalthazar@gmail.com>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> Hi,
>> >> >> >>>
>> >> >> >>> You can run "nodetool compactionstats" to view statistics
on
>> >> >> >>> compactions.
>> >> >> >>> Setting cold_reads_to_omit to 0.0 can help to reduce
the number
>> >> >> >>> of
>> >> >> >>> SSTables when you use Size-Tiered compaction.
>> >> >> >>> You can also create a cron job to increase the value
of
>> >> >> >>> setcompactionthroughput during the night or when your
IO is not
>> >> >> >>> busy.
>> >> >> >>>
>> >> >> >>> From http://wiki.apache.org/cassandra/NodeTool:
>> >> >> >>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput
>> >> >> >>> 999
>> >> >> >>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput
16
>> >> >> >>>
>> >> >> >>> Cheers,
>> >> >> >>>
>> >> >> >>> Roni Balthazar
>> >> >> >>>
>> >> >> >>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam <ptrstpppp@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>> > One think I do not understand. In my case compaction
is
>> >> >> >>> > running
>> >> >> >>> > permanently.
>> >> >> >>> > Is there a way to check which compaction is pending?
The only
>> >> >> >>> > information is
>> >> >> >>> > about total count.
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > On Monday, February 16, 2015, Ja Sam <ptrstpppp@gmail.com>
>> >> >> >>> > wrote:
>> >> >> >>> >>
>> >> >> >>> >> Of couse I made a mistake. I am using 2.1.2.
Anyway night
>> >> >> >>> >> build
>> >> >> >>> >> is
>> >> >> >>> >> available from
>> >> >> >>> >> http://cassci.datastax.com/job/cassandra-2.1/
>> >> >> >>> >>
>> >> >> >>> >> I read about cold_reads_to_omit It looks
promising. Should I
>> >> >> >>> >> set
>> >> >> >>> >> also
>> >> >> >>> >> compaction throughput?
>> >> >> >>> >>
>> >> >> >>> >> p.s. I am really sad that I didn't read this
before:
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> On Monday, February 16, 2015, Carlos Rolo
<rolo@pythian.com>
>> >> >> >>> >> wrote:
>> >> >> >>> >>>
>> >> >> >>> >>> Hi 100% in agreement with Roland,
>> >> >> >>> >>>
>> >> >> >>> >>> 2.1.x series is a pain! I would never
recommend the current
>> >> >> >>> >>> 2.1.x
>> >> >> >>> >>> series
>> >> >> >>> >>> for production.
>> >> >> >>> >>>
>> >> >> >>> >>> Clocks is a pain, and check your connectivity!
Also check
>> >> >> >>> >>> tpstats
>> >> >> >>> >>> to
>> >> >> >>> >>> see
>> >> >> >>> >>> if your threadpools are being overrun.
>> >> >> >>> >>>
>> >> >> >>> >>> Regards,
>> >> >> >>> >>>
>> >> >> >>> >>> Carlos Juzarte Rolo
>> >> >> >>> >>> Cassandra Consultant
>> >> >> >>> >>>
>> >> >> >>> >>> Pythian - Love your data
>> >> >> >>> >>>
>> >> >> >>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
>> >> >> >>> >>> linkedin.com/in/carlosjuzarterolo
>> >> >> >>> >>> Tel: 1649
>> >> >> >>> >>> www.pythian.com
>> >> >> >>> >>>
>> >> >> >>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland
Etzenhammer
>> >> >> >>> >>> <r.etzenhammer@t-online.de> wrote:
>> >> >> >>> >>>>
>> >> >> >>> >>>> Hi,
>> >> >> >>> >>>>
>> >> >> >>> >>>> 1) Actual Cassandra 2.1.3, it was
upgraded from 2.1.0
>> >> >> >>> >>>> (suggested
>> >> >> >>> >>>> by
>> >> >> >>> >>>> Al
>> >> >> >>> >>>> Tobey from DataStax)
>> >> >> >>> >>>> 7) minimal reads (usually none, sometimes
few)
>> >> >> >>> >>>>
>> >> >> >>> >>>> those two points keep me repeating
an anwser I got. First
>> >> >> >>> >>>> where
>> >> >> >>> >>>> did
>> >> >> >>> >>>> you
>> >> >> >>> >>>> get 2.1.3 from? Maybe I missed it,
I will have a look. But
>> >> >> >>> >>>> if
>> >> >> >>> >>>> it
>> >> >> >>> >>>> is
>> >> >> >>> >>>> 2.1.2
>> >> >> >>> >>>> whis is the latest released version,
that version has many
>> >> >> >>> >>>> bugs -
>> >> >> >>> >>>> most of
>> >> >> >>> >>>> them I got kicked by while testing
2.1.2. I got many
>> >> >> >>> >>>> problems
>> >> >> >>> >>>> with
>> >> >> >>> >>>> compactions not beeing triggred on
column families not
>> >> >> >>> >>>> beeing
>> >> >> >>> >>>> read,
>> >> >> >>> >>>> compactions and repairs not beeing
completed.  See
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>>
>> >> >> >>> >>>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>> >> >> >>> >>>>
>> >> >> >>> >>>> Apart from that, how are those both
datacenters connected?
>> >> >> >>> >>>> Maybe
>> >> >> >>> >>>> there
>> >> >> >>> >>>> is a bottleneck.
>> >> >> >>> >>>>
>> >> >> >>> >>>> Also do you have ntp up and running
on all nodes to keep
>> >> >> >>> >>>> all
>> >> >> >>> >>>> clocks
>> >> >> >>> >>>> in
>> >> >> >>> >>>> thight sync?
>> >> >> >>> >>>>
>> >> >> >>> >>>> Note: I'm no expert (yet) - just
sharing my 2 cents.
>> >> >> >>> >>>>
>> >> >> >>> >>>> Cheers,
>> >> >> >>> >>>> Roland
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> --
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Mime
View raw message