storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Niskodé-Dossett <doss...@gmail.com>
Subject Re: Increasing worker parallelism decreases throughput and increases tuple timeout
Date Wed, 07 Sep 2016 16:11:43 GMT
Let us know what you find, especially if the serializer needs to be more
defensive to ensure proper caching.

On Tue, Sep 6, 2016 at 8:45 AM Kristopher Kane <kkane.list@gmail.com> wrote:

> Come to think of it, I did see RestUtils rank some what higher in the
> visualvm CPU profiler but did not give it the attention it deserved.
>
> On Tue, Sep 6, 2016 at 9:39 AM, Aaron Niskodé-Dossett <dossett@gmail.com>
> wrote:
>
>> Hi Kris,
>>
>> One possibility is that the Serializer isn't actually caching the schema
>> <-> id mappings and is hitting the schema registry every time.  The call to
>> register() in getFingerprint() [1] in particular can be a finicky since the
>> cache is ultimately in an IDENTITY hash map, not a regular old hashmap[2].
>> I'm familiar with the Avro deserializer you're using and though it
>> accounted for this, but perhaps not.
>>
>> You could add timing information to the getFingerprint() and getSchema()
>> calls in ConfluentAvroSerializer.  If the results indicate cache misses,
>> that's probably your culprit.
>>
>> Best, Aaron
>>
>>
>> [1]
>> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/avro/ConfluentAvroSerializer.java#L66
>> [2]
>> https://github.com/confluentinc/schema-registry/blob/v1.0/client/src/main/java/io/confluent/kafka/schemaregistry/client/CachedSchemaRegistryClient.java#L79
>>
>> On Tue, Sep 6, 2016 at 7:40 AM Kristopher Kane <kkane.list@gmail.com>
>> wrote:
>>
>>> Hi everyone.
>>>
>>> I have a simple topology that uses the Avro serializer (
>>> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/avro/ConfluentAvroSerializer.java)
>>> and writes to Elasticsearch.
>>>
>>> The topology is like this:
>>>
>>> Kafka (raw scheme) -> Avro deserializer -> Elasticsearch
>>>
>>> This topology runs well with one worker, however, once I add one more
>>> worker (total of two) and change nothing else, the topology throughput
>>> drops and tuples start timing out.
>>>
>>> I've attached visualvm/jstatd to the workers when in multi worker mode -
>>> and added some jmx configs to the worker opts - but I am unable to see
>>> anything glaring.
>>>
>>> I've never seen Storm act this way but have also never worked with a
>>> custom serializer so assume that it is the culprit but I cannot explain
>>> why.
>>>
>>> Any pointers would be appreciated.
>>>
>>> Kris
>>>
>>>
>>>
>>>
>>>
>

Mime
View raw message