kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Lieber-Dembo <a...@cloudera.com>
Subject Re: Kudu CLI tool JSON format
Date Tue, 11 Jun 2019 18:46:55 GMT
Thanks for the report. I filed KUDU-2845 to track the issue.

On Tue, Jun 11, 2019 at 9:44 AM Todd Lipcon <todd@cloudera.com> wrote:
> I guess the issue is that we use rapidjson's 'String' support to write out C++ strings,
which are binary data, not valid UTF8. That's somewhat incorrect of us, and we should be base64-encoding
such binary data.
> Fixing this is a little bit incompatible, but for something like partition keys I think
we probably should do it anyway and release note it, considering partition keys are quite
likely to be invalid UTF8.
> -Todd
> On Tue, Jun 11, 2019 at 6:08 AM Pavel Martynov <mr.xkurt@gmail.com> wrote:
>> Hi, guys!
>> We trying to use an output of "kudu cluster ksck master -ksck_format json_compact"
for integration with our monitoring system and hit a little strange. Some part of output can't
be read as UTF-8 with Python 3:
>> $ kudu cluster ksck master -ksck_format json_compact > kudu.json
>> $ python
>> with open(' kudu.json', mode='rb') as file:
>>   bs = file.read()
>>   bs.decode('utf-8')
>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 705196: invalid
start byte
>> There how SublimeText shows this block of text: https://yadi.sk/i/4zpWKZ37iP8OEA
>> As you can see kudu tool encodes zeros as \u0000, but don't encode some other non-text
>> What do you think about it?
>> --
>> with best regards, Pavel Martynov
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message