druid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarth Jain <samarth.j...@gmail.com>
Subject Re: [druid-user] Druid 12.1 Datasource load fails in Coordinator due change in implementation from Set to Map
Date Thu, 28 Jun 2018 17:35:25 GMT
Adding the dev email group.

We are currently hitting this problem in our environment too where loading
200K segments is taking forever where as on 10.1 the load happened in less
than 5 minutes. I see a pull request (
https://github.com/druid-io/druid/pull/5878) that potentially fixes this
issue that was checked in to master. I believe this fix would be part of
the 0.12.2 release whenever it comes out.

On Thu, Jun 28, 2018 at 1:50 AM, Venu Reddy <venugopalreddyn@gmail.com>

> Hi Team,
> We have close to ~500,000 active data segment in the Metadata store
> (Postgres)
> Coordinator is running on a 4 CPU, Centos server
> We have updated from 0.10.0 to Druid 0.12.1, Post this when we bring up
> the Co-ordinator we see the below behaviour
> The datasource loading keeps running and goes into a hung state in inside
> the poll() function in *SQLMetadataSegmentManager.java*.
> On further debugging we see that below portion is the one that is taking
> time
> if (!dataSource.getSegments().contains(segment)) {
>   dataSource.addSegment(segment);
> }
> And it seems like the main reason it is taking time is due to the change
> in the file *DruidDataSource.java* from *ConcurrentSkipListSet* (and a
> HashMap) to *ConcurrentSkipListMap*
> We added additional logging statements to time the above section in the
> *SQLMetadataSegmentManager.java *and we see that as the loop runs
> collecting segments, initially the time taken is less than a milli second
> but as the loop runs inserting more records into the
> *ConcurrentSkipListMap,* the insertions take ~8 ms by ~50k records and
> then increase all the way to ~300 ms when we reach 300K records
> We also added the same timers to the *lower version* of Druid and with
> *ConcurrentSkipListSet *the implementation the loop completes processing
> the 500k records in 5 mins.
> Also when we try with a higher config machine 32 CPU, we still see the
> same behaviour.
> In Summary it seems like *ConcurrentSkipListMap* is slower than
> *ConcurrentSkipListSet* and is resulting in some sort of timeout in
> version 0.12.1 whereas the same number of segments are getting loaded
> without issues in under 10 mins in version 0.10.0. Also when we check the
> code, the code in 0.11.0 seems identical to 0.10.0 however the 0.12.1 has
> this change.
> Regards,
> Venu
> --
> You received this message because you are subscribed to the Google Groups
> "Druid User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-user+unsubscribe@googlegroups.com.
> To post to this group, send email to druid-user@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/druid-user/50a75795-57af-455c-955b-7153379b9253%40googlegroups.com
> <https://groups.google.com/d/msgid/druid-user/50a75795-57af-455c-955b-7153379b9253%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message