kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Kudu CLI tool JSON format
Date Tue, 11 Jun 2019 16:43:46 GMT
I guess the issue is that we use rapidjson's 'String' support to write out
C++ strings, which are binary data, not valid UTF8. That's somewhat
incorrect of us, and we should be base64-encoding such binary data.

Fixing this is a little bit incompatible, but for something like partition
keys I think we probably should do it anyway and release note it,
considering partition keys are quite likely to be invalid UTF8.


On Tue, Jun 11, 2019 at 6:08 AM Pavel Martynov <mr.xkurt@gmail.com> wrote:

> Hi, guys!
> We trying to use an output of "kudu cluster ksck master -ksck_format
> json_compact" for integration with our monitoring system and hit a little
> strange. Some part of output can't be read as UTF-8 with Python 3:
> $ kudu cluster ksck master -ksck_format json_compact > kudu.json
> $ python
> with open(' kudu.json', mode='rb') as file:
>   bs = file.read()
>   bs.decode('utf-8')
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position
> 705196: invalid start byte
> There how SublimeText shows this block of text:
> https://yadi.sk/i/4zpWKZ37iP8OEA
> As you can see kudu tool encodes zeros as \u0000, but don't encode some
> other non-text bytes.
> What do you think about it?
> --
> with best regards, Pavel Martynov

Todd Lipcon
Software Engineer, Cloudera

View raw message