kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neil Avery (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5515) Consider removing date formatting from Segments class
Date Thu, 29 Jun 2017 16:16:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068546#comment-16068546
] 

Neil Avery commented on KAFKA-5515:
-----------------------------------

I've taken a look at dropping SimpleDateFormat and replacing it with commons-lang3-FastDateFormat
(available in project but not a dependency on this module). 

Microbenchmarking diffs show SDF starts at 800ms/million then hotspots down to 250ms. Interestingly
FDF starts at 400ms/million then gets down to 350ms (not very convincing). Calendar usage
sucks performance and there is a degree of caching inside both of the impls. Looking at this
in a different way "Segments" is a time-series slice/bucketing function to group/allocate/lookup
segments etc. 

Does a real world calendar matter? - I've knocked together a simple math alternative that
break into time-slice where all months/years are equals size. The time formatting is identical
but day/month will be incorrect as a result of no calendar. This gets down to 150ms pretty
much straight away. (still using SDF is still used for parsing).

All tests pass, system runs fine etc - but I'm not sure of the gravity of this as a possible
change - will it break things - any advice or feedback?

> Consider removing date formatting from Segments class
> -----------------------------------------------------
>
>                 Key: KAFKA-5515
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5515
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Bill Bejeck
>            Assignee: Neil Avery
>              Labels: performance
>
> Currently the {{Segments}} class uses a date when calculating the segment id and uses
{{SimpleDateFormat}} for formatting the segment id.  However this is a high volume code path
and creating a new {{SimpleDateFormat}} and formatting each segment id is expensive.  We should
look into removing the date from the segment id or at a minimum use a faster alternative to
{{SimpleDateFormat}}.  We should also consider keeping a lookup of existing segments to avoid
as many string operations as possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message