doris-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-doris] vagetablechicken commented on issue #5036: spark SQL在yarn上获取Doris数据中文乱码
Date Thu, 07 Jan 2021 02:49:43 GMT

vagetablechicken commented on issue #5036:
URL: https://github.com/apache/incubator-doris/issues/5036#issuecomment-755848621


   I've found a similar error. The reason is:
   1. be side: use the utf8 charset to encode 
   https://github.com/apache/incubator-doris/blob/65d33cf43c837e56a2a36e78b358bfc0a9d1916b/be/src/util/arrow/row_batch.cpp#L80
   1. spark-doris-connector side: use the default charset
   https://github.com/apache/incubator-doris/blob/65d33cf43c837e56a2a36e78b358bfc0a9d1916b/extension/spark-doris-connector/src/main/java/org/apache/doris/spark/serialization/RowBatch.java#L271
   
   In my environment, the default charset is US-ASCII, so the Chinese characters become messy.
   It's better to specify charset `UTF_8` in `serialization/RowBatch`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Mime
View raw message