kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Bryński (JIRA) <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6632) Very slow hashCode methods in Kafka Connect types
Date Fri, 09 Mar 2018 12:20:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Maciej Bryński updated KAFKA-6632:
----------------------------------
    Description: 
hashCode method of ConnectSchema (and Field) is used a lot in SMT and fromConnect.

Example:

[https://github.com/apache/kafka/blob/e5d6c9a79a4ca9b82502b8e7f503d86ddaddb7fb/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L164]

Unfortunately it's using Objects.hash which is very slow.

I rewrite this to own implementation and gain 6x speedup.

Microbencharks gives:
 * Original ConnectSchema hashCode: 2995ms
 * My implementation: 517ms

(100000000 iterations of calculating: hashCode for on new ConnectSchema(Schema.Type.STRING))
{code:java}
@Override
public int hashCode() {
    int result = 5;
    result = 31 * result + type.hashCode();
    result = 31 * result + (optional ? 1 : 0);
    result = 31 * result + (defaultValue == null ? 0 : defaultValue.hashCode());
    if (fields != null) {
        for (Field f : fields) {
            result = 31 * result + f.hashCode();
        }
    }
    result = 31 * result + (keySchema == null ? 0 : keySchema.hashCode());
    result = 31 * result + (valueSchema == null ? 0 : valueSchema.hashCode());
    result = 31 * result + (name == null ? 0 : name.hashCode());
    result = 31 * result + (version == null ? 0 : version);
    result = 31 * result + (doc == null ? 0 : doc.hashCode());
    if (parameters != null) {
        for (Map.Entry<String, String> e : parameters.entrySet()) {
            result = 31 * result + e.getKey().hashCode() + e.getValue().hashCode();
        }
    }
    return result;
}{code}

  was:
hashCode method of ConnectSchema (and Field) is used a lot in SMT.

Example:

[https://github.com/apache/kafka/blob/e5d6c9a79a4ca9b82502b8e7f503d86ddaddb7fb/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L164]

Unfortunately it's using Objects.hash which is very slow.

I rewrite this to own implementation and gain 6x speedup.

Microbencharks gives:
 * Original ConnectSchema hashCode: 2995ms
 * My implementation: 517ms

(100000000 iterations of calculating: hashCode for on new ConnectSchema(Schema.Type.STRING))
{code:java}
@Override
public int hashCode() {
    int result = 5;
    result = 31 * result + type.hashCode();
    result = 31 * result + (optional ? 1 : 0);
    result = 31 * result + (defaultValue == null ? 0 : defaultValue.hashCode());
    if (fields != null) {
        for (Field f : fields) {
            result = 31 * result + f.hashCode();
        }
    }
    result = 31 * result + (keySchema == null ? 0 : keySchema.hashCode());
    result = 31 * result + (valueSchema == null ? 0 : valueSchema.hashCode());
    result = 31 * result + (name == null ? 0 : name.hashCode());
    result = 31 * result + (version == null ? 0 : version);
    result = 31 * result + (doc == null ? 0 : doc.hashCode());
    if (parameters != null) {
        for (String s : parameters.keySet()) {
            result = 31 * result + s.hashCode() + parameters.get(s).hashCode();
        }
    }
    return result;
}{code}


> Very slow hashCode methods in Kafka Connect types
> -------------------------------------------------
>
>                 Key: KAFKA-6632
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6632
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Maciej Bryński
>            Priority: Major
>
> hashCode method of ConnectSchema (and Field) is used a lot in SMT and fromConnect.
> Example:
> [https://github.com/apache/kafka/blob/e5d6c9a79a4ca9b82502b8e7f503d86ddaddb7fb/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L164]
> Unfortunately it's using Objects.hash which is very slow.
> I rewrite this to own implementation and gain 6x speedup.
> Microbencharks gives:
>  * Original ConnectSchema hashCode: 2995ms
>  * My implementation: 517ms
> (100000000 iterations of calculating: hashCode for on new ConnectSchema(Schema.Type.STRING))
> {code:java}
> @Override
> public int hashCode() {
>     int result = 5;
>     result = 31 * result + type.hashCode();
>     result = 31 * result + (optional ? 1 : 0);
>     result = 31 * result + (defaultValue == null ? 0 : defaultValue.hashCode());
>     if (fields != null) {
>         for (Field f : fields) {
>             result = 31 * result + f.hashCode();
>         }
>     }
>     result = 31 * result + (keySchema == null ? 0 : keySchema.hashCode());
>     result = 31 * result + (valueSchema == null ? 0 : valueSchema.hashCode());
>     result = 31 * result + (name == null ? 0 : name.hashCode());
>     result = 31 * result + (version == null ? 0 : version);
>     result = 31 * result + (doc == null ? 0 : doc.hashCode());
>     if (parameters != null) {
>         for (Map.Entry<String, String> e : parameters.entrySet()) {
>             result = 31 * result + e.getKey().hashCode() + e.getValue().hashCode();
>         }
>     }
>     return result;
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message