mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh <anirudh2...@gmail.com>
Subject Re: UTF-8 Support for TextParser
Date Tue, 27 Feb 2018 01:18:25 GMT
Hi Marco,

I understand that there needs to be a different discussion on strong
dependency of mxnet and dmlc-core and how to fix it.

Having said that, I think the goals of dmlc-core and mxnet are somewhat
aligned. Posting in the MXNet dev list for this case
is a good way to gather feedback from both the communities since I consider
the MXNet community to be mostly a superset of the dmlc-core community.

Anirudh

On Mon, Feb 26, 2018 at 5:00 PM, Subramanian, Anirudh <anisub@amazon.com>
wrote:

> Hi Tianqi,
>
> The UTF-8 support would enable other formats like CSV more usable.
> Otherwise, they have to handle normalizing their data in some way before
> using mxnet.
> I understand that there is a tradeoff here because of the efficiency gains
> from the parser but the expectation of having to normalize their UTF-8
> files may turn users away.
>
> Anirudh
>
> On 2/26/18, 3:54 PM, "workcrow@gmail.com on behalf of Tianqi Chen" <
> workcrow@gmail.com on behalf of tqchen@cs.washington.edu> wrote:
>
>     Since LibSVM format is only going to involve numbers and possibly ascii
>     characters, is there any reason adding UTF-8 support? Note that
>     generalization always comes with cost of efficiency and there is some
>     effort spent on making parser fast
>
>     Tianqi
>
>     On Mon, Feb 26, 2018 at 3:38 PM, Anirudh <anirudh2290@gmail.com>
> wrote:
>
>     > Hi all,
>     >
>     > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text
> parsers.
>     > I am currently working on adding UTF-8 support for Text parsers.
> Since C++
>     > doesn't have a great built-in support for UTF-8, I am looking at
>     > third-party libraries which provide Unicode support. I am
> considering ICU
>     > currently. Any comments, suggestions, past experience, gotchas about
>     > unicode third party libraries or adding unicode support in general is
>     > highly appreciated.
>     >
>     > I have created an issue about the same:
>     > https://github.com/dmlc/dmlc-core/issues/372
>     > Please feel free to reply to this email or comment on the github
> issue if
>     > you have any inputs.
>     >
>     > Anirudh
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message