kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-732) MirrorMaker with shallow.iterator.enable=true produces unreadble messages
Date Tue, 05 Mar 2013 06:49:14 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jun Rao updated KAFKA-732:

    Affects Version/s: 0.8
> MirrorMaker with shallow.iterator.enable=true produces unreadble messages
> -------------------------------------------------------------------------
>                 Key: KAFKA-732
>                 URL: https://issues.apache.org/jira/browse/KAFKA-732
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, producer 
>    Affects Versions: 0.8, 0.8.1
>            Reporter: Maxime Brugidou
>            Assignee: Neha Narkhede
>            Priority: Blocker
> Trying to use MirrorMaker between two 0.8 clusters
> When using shallow.iterator.enable=true on the consumer side, the performance gain is
big (when incoming messages are compressed) and the producer does not complain but write the
messages uncompressed without the compression flag.
> If you try:
> - enable compression on the producer, it obviously makes things worse since the data
get double-compressed (the wiki warns about this)
> - disable compression and the compressed messages are written in bulk in an uncompressed
message, thus making it unreadable.
> If I follow correctly the current state of code from MirrorMaker to the produce request,
there is no way for the producer to know whether the message is deep or not. So I wonder how
it worked on 0.7?
> Here is the code as i read it (correct me if i'm wrong):
> 1. MirrorMakerThread.run(): create KeyedMessage[Array[Byte],Array[Byte]](topic, message)
> 2. Producer.send() -> DefaultEventHandler.handle()
> 3. DefaultEventHandler.serialize(): use DefaultEncoder for the message (does nothing)
> 4. DefaultEventHandler.dispatchSerializedData():
> 4.1 DefaultEventHandler.partitionAndCollate(): group messages by broker/partition/topic
> 4.2 DefaultEventHandler.dispatchSerializeData(): cycle through each broker
> 4.3 DefaultEventHandler.groupMessagesToSet(): Create a ByteBufferMessageSet for each
partition/topic grouping all the messages together, and compressing them if needed
> 4.4 DefaultEventHandler.send(): send the ByteBufferMessageSets for this broker in one
> The gist is that in DEH.groupMessagesToSet(), you don't know wether the raw message in
KeyedMessage.message is shallow or not. So I think I missed something... Also it doesn't seem
possible to send batch of deep messages in one ProduceRequest.
> I would love to provide a patch (or if you tell me that i'm doing it wrong, it's even
better), since I can easily test it on my test clusters but I will need guidance here.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message