incubator-bloodhound-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Branko Čibej <br...@wandisco.com>
Subject Re: svn commit: r1455576 - in /incubator/bloodhound/branches/bep_0003_multiproduct/bloodhound_multiproduct/multiproduct: api.py hooks.py web_ui.py
Date Wed, 13 Mar 2013 17:21:49 GMT
On 13.03.2013 17:41, Olemis Lang wrote:
> On 3/13/13, Branko Čibej <brane@wandisco.com> wrote:
>> On 13.03.2013 17:17, Olemis Lang wrote:
>>> On 3/13/13, Branko Čibej <brane@wandisco.com> wrote:
> [...]
>>>> Not to mention that those 4 variants of i have 6 different Unicode
>>>> representations. You do *not* want to deal with Unicode normalization
>>>> issues in primary keys.
>>>>
>>> Like I just said in my previous message ... we already deal with
>>> unicode values in primary keys , so that belongs in the past ...
>> Just make triply sure that Trac core actually does normalize the keys
>> before writing them to the database.
> Ok . In advance I could say that I've seen before test cases all over
> for unicode values and primary keys . Nevertheless , my initial
> suggestion will only have an impact on product prefixes , so anything
> else in Trac will not be the target , it's BH
> ;)
>
>> Otherwise I'd consider that a
>> serious bug, because it leaves room for having two identical-looking
>> primary keys with different bit values.
> A few test cases , and invoking `to_unicode` in the right places and
> afaict everything will be fine .

No. I'm not worried about Unicode *validation*, I'm worried about
Unicode *normalization*.

>> Database collations typically
>> are not normalization-agnostic.
>>
> Yes u'r right ... Trac core takes care of that .

Where?

>> As far as I can see, Trac core only normalizes the names of attachments.
>> That's not enough.
>>
> there are test cases for others e.g. wiki pages ...

Test cases do not cut it. I see a single instance of normalization in
the Trac code:

$ find trac -name '*.py' | xargs fgrep unicodedata
trac/trac/attachment.py:import unicodedata
trac/trac/attachment.py:        filename = unicodedata.normalize('NFC', unicode(upload.filename,
trac/trac/util/text.py:from unicodedata import east_asian_width

Only the second hit actually normalizes anything. Browsers, Javascript,
etc. etc. do not care about normalization, so two people, e.g., one
using Mac and another using Windows, can create a product with the tag

    unglücklich

and would get two different products with what appears to be the same
prefix -- but with one in NFD form and the other in NFC form.

If you insist this is up to trac-core, then this needs to be fixed
there; however, I'd expect such a fix to also require a database upgrade
-- i.e., explicitly normalizing all such keys in existing databases.

Of course that could lead to conflicts, but its better to resolve them
by renaming the keys than by not doing the homework WRT Unicode
normalization in the first place.


FWIW, Subversion suffers from the same problem, and it's even harder to
fix there, which is why I'm kind of sensitive to it.

-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Mime
View raw message