mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Schwartzmeyer <and...@schwartzmeyer.com>
Subject Re: Review Request 54877: Windows: Stout: Removed dependency on Shell API.
Date Tue, 03 Jan 2017 18:51:28 GMT


> On Dec. 20, 2016, 12:29 a.m., Daniel Pravat wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, line 748
> > <https://reviews.apache.org/r/54877/diff/1/?file=1589098#file1589098line748>
> >
> >     I don't think the conversion to UTF-8 is appropiate here.
> 
> Andrew Schwartzmeyer wrote:
>     What would you convert it to? It's currently in UTF-16, and Windows paths are allowed
to have (almost) any Unicode character.
> 
> Alex Clemmer wrote:
>     I think he's saying that bad things will happen if you use strings that have Unicode
in them, so it's probably better to just error out.
> 
> Andrew Schwartzmeyer wrote:
>     I'm not following... That is a case our codebase must deal with, NTFS paths are stored
in Unicode:
>     
>     https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx
>     
>     and on top of that, Windows file paths can:
>     
>     > Use any character in the current code page for a name, including Unicode characters
and characters in the extended character set (128–255)
>     
>     https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
>     
>     So what I'm saying is that bad things _must not happen_ if we have paths with non-ASCII
characters in them; we need to handle that at any point we deal with file paths on Windows.
>     
>     I'm not saying this function is likely to return a path like `C:\ÂÑÐ¥`, but that
would be a valid file path, and so we must be treating all our paths on Windows as Unicode.
>     
>     If this is not the case, are purposefully constraining ourselves to ASCII, and if
so, why?
> 
> Alex Clemmer wrote:
>     Sorry, I meant bad things will happen to Mesos if you give it strings with unicode
in them. Mesos itself does not deal with Unicode gracefully.
> 
> Andrew Schwartzmeyer wrote:
>     Oooh. We should probably fix that.
> 
> Andrew Schwartzmeyer wrote:
>     Alex, Daniel, do we have a resolution on this? I spoke with Alex and we feel that
we should be doing two main things: avoid exacberating the problem in MESOS-6817 by being
explicit in our use of `W` APIs, rather than relying on the mercurial value of `TCHAR`; and
not attempt to guard against paths that Mesos does not handle, as it is a bug that Mesos does
not handle paths with valid extended characters (since both Linux and Windows may have these
paths), so failing here would be inappropriate as it hides the actual bug. (Not to speak for
you Alex, I think this summarizes our discussion yesterday.)
> 
> Alex Clemmer wrote:
>     If I'm understanding our discussion correctly, failing here on unicode conversion
would be preferable, since it makes a subtle bug more obvious. If that's nontrivial I think
we can just ship it as is, since Mesos already has these problems.
> 
> Andrew Schwartzmeyer wrote:
>     > failing here on unicode conversion
>     
>     What would fail here?
>     
>     I *think* you're implying to add a detection algorithm to see if there are any characters
in the path that Mesos does not believe are valid; but that is not "failing on Unicode", we'd
need to write our own validator for Mesos.
> 
> Alex Clemmer wrote:
>     I was just saying, if there's an easy way to try to interpret the string as normal
ASCII and return `Error` if that fails, we should do that. If not, it's not that big of a
deal, because Mesos will probably fail anyway.

Ah I see what you want: 
> interpret the string as normal ASCII and return Error if that fails

but I don't believe this is possible. To be clear, the non-Unicode versions of these APIs
do not return ASCII; they return the same data encoded with the current Windows code page
(which is a superset of ASCII, in much the same way as UTF-8). If we used this version, we
would not only hit the same problem (possible non-ASCII characters), but also introduce a
new problem of having to decode from an arbitrary Windows code page (rather than UTF-16 which
is at least standard). With either the short or wide version of the API, detecting if the
string is not ASCII would mean implmenenting a validator, which I don't think we want to do
here (it's the wrong solution to the problem).

That said, what would you suggest?

P.S. For what it's worth, I don't expect this particular API to give us non-ASCII characters,
but I think we should choose a pattern for dealing with Windows APIs and stick to it. This
pattern gives us guaranteed UTF-8 strings with whatever data we're expected to handle.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54877/#review159672
-----------------------------------------------------------


On Dec. 19, 2016, 11:20 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54877/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2016, 11:20 p.m.)
> 
> 
> Review request for mesos, Daniel Pravat, Alex Clemmer, and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The API `SHGetKnownFolderPath` requires `Shell32.dll`,
> which is not available on Nano server.
> The equivalent API `GetAllUsersProfileDirectory`
> only requires `Userenv.dll`, which is available on Nano.
> 
> This API is also friendlier, as we own the allocation.
> 
> The Unicode version `GetAllUsersProfileDirectoryW` is
> explicitly used so that we are guaranteed a Unicode path,
> which we then convert from UTF-16 to UTF-8,
> instead of using the ASCII version which depends on a
> varying Windows code-page, and is not recommended.
> 
> A `vector<wchar_t>` is used over a `wstring` to avoid dealing
> with the placement of the null-terminating character.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows.hpp e641c46d033372e1b6c9f9c066b1ad4957d55088 
>   3rdparty/stout/include/stout/windows/os.hpp 5cd92545a49648e39e8eb7cf131895e9cfc97902

> 
> Diff: https://reviews.apache.org/r/54877/diff/
> 
> 
> Testing
> -------
> 
> cmake && msbuild, attach agent to master and check default `runtime_dir` value.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message