flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Foster, Craig" <foscr...@amazon.com>
Subject Re: Wikiedit QuickStart with Kinesis
Date Thu, 01 Sep 2016 17:48:55 GMT
Oh, in that case, maybe I should look into using the KCL. I'm just using boto and boto3 which
are definitely having different problems but both related to the encoding.
 
boto3 prints *something*:
 
(.96.129.59,-20)'(01:541:4305:C70:10B4:FA8C:3CF9:B9B0,0(Patrick Barlane,0(Nedrutland,12(GreenC
bot,15(Bamyers99,-170(Mean as custard,661)ж?U????¨p?&"1w?
??
(RaphaelQS,-3(.44.211.32,50(JMichael22,1298)
                                           (Wbm1058,-9)Y?Z????/r(???
 
But boto just gives an exception:
 
<type 'exceptions.UnicodeDecodeError'>: 'utf8' codec can't decode bytes in position
0-2: invalid continuation byte
 
It does this even when getting a response object.
 
Thanks for your help! I'll try with the KCL before changing my SerializationSchema just yet.


On 9/1/16, 10:37 AM, "Tzu-Li Tai" <tzulitai@gmail.com> wrote:

    I’m afraid the “AUTO” option on the Kinesis producer is actually bugged, so
    the internally used KPL library correctly pick up credentials with the
    default credential provider chain. I’ve just filed a JIRA for this: 
    https://issues.apache.org/jira/browse/FLINK-4559
    <https://issues.apache.org/jira/browse/FLINK-4559>  .
    
    Regarding the data encoding:
    So, you’re using a SimpleStringSchema for serialization of the tuple
    strings. The SimpleStringSchema simply calls String.getBytes() to serialize
    the string into a byte array, so the data is encoded with the platform’s
    default charset.
    Internally, the SimpleStringSchema is wrapped within a
    KinesisSerializationSchema, which ultimately wraps the bytes within
    a ByteBuffer that is added to the internal KPL for writing to the streams.
    You can actually also choose to directly use a KinesisSerializationSchema to
    invoke the FlinkKinesisProducer.
    
    Also note that since FlinkKinesisProducer uses KPL, there might be some
    "magic bytes" in the encoded data added by KPL to aggregate multiple records
    when writing to streams. If you’re using the AWS KCL (Kinesis Client
    Library) or the FlinkKinesisConsumer to read the data, there shouldn’t be a
    problem decoding them (the FlinkKinesisConsumer uses a class from KCL to
    help decode aggregated records sent by KPL).
    
    Let me know if you bump into any other problems ;)
    
    Regards,
    Gordon
    
    
    
    --
    View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wikiedit-QuickStart-with-Kinesis-tp8819p8840.html
    Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
    

Mime
View raw message