openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Fischer <awf....@gmail.com>
Subject Re: Improvements of OUString
Date Tue, 03 Dec 2013 12:02:28 GMT
On 03.12.2013 10:35, Herbert Duerr wrote:
> On 03.12.2013 09:13, Andre Fischer wrote:
>> A developer who apparently wants to remain anonymous has added the
>> function isEmpty() to the rtl::OUString class.  See
>> main/sal/inc/rtl/ustring.hxx for not much more information.
>
> Sorry for being too short. The full semantic for isEmpty() is:
>
> "The method isEmpty() returns true if the string is empty. If the 
> length of the string is one or two or three or any number bigger than 
> zero then isEmpty() returns false."

Additionally to this almost correct statement one could mention that 
isEmpty() is preferred over getLength()>0 and why.

Can you tell me what happens when an OUString is created for "\0". Is 
that handled as end-of-string or as just one additional character?

>
> I added isEmpty() to make it possible to cleanly express the check for 
> an empty string. In our codebase there were quite a few constructs 
> such as
>     if( aString) {}
> which were intended to mean
>     if( aString.isEmpty()) {}
> What's funny is that the old construct compiled but it did the wrong 
> thing: The string was implicitly converted to a pointer to its 
> elements and that pointer was then compared against NULL. For our 
> OUString that pointer was always non-NULL though.
>
> Please see issue 123068 for further problems caused by the implicit 
> conversion of the OUString to a pointer to its elements. This 
> dangerous conversion is now disabled. By making the method private all 
> such problems will be found and prevented by the compiler. When we're 
> confident that all has been found the operator can be removed completely.
>
>> This in itself may not yet be very exciting but I hope that it is the
>> first of several improvements to one of our most frequently used
>> classes.  Sadly, we missed the opportunity to make some more substantial
>> but incompatible changes for the 4.0 release. However, some changes that
>> make OUString more accessible to new (and old) developers might include:
>>
>> - Make construction from string literal more straightforward. At the
>> moment you have to write
>>      ::rtl::OUString("text", sizeof("text"), RTL_TEXTENCODING_ASCII_US)
>>    or slightly shorter and safer
>>      ::rtl::OUString::createFromAscii("text")
>
> Allocating heap space, transcoding a literal string to this memory and 
> deallocating it later when the string is deleted are quite wasteful 
> operations. Especially when considering that the literal string is 
> already there. It would be great if constructs such
>     OUString( L"hello")
> used the pointer to the UTF-16 literal directly instead of copying its 
> contents around. The same applies for the OString(). The 'L' prefix is 
> a Windows convention but C++11 has even more possibilities with its 
> support for unicode string literals.
>
> Also we shouldn't bother our main string classes with non-unicode 
> support. Having external tooling for converting from/to other 
> encodings is still needed though.

We should drop our support for ASCII?

>
> Looking over our string processing I'm confident that we could get 
> along great with UTF-8 strings. Only when interfacing with other APIs 
> an eventual conversion to UTF-16 would be needed.
>
> And if we were using UTF-8 byte strings we could base them directly on 
> the standard std::string.
>
>> - Conversion back to char* is not much better
>>      ::rtl::OUStringToOString(sOUStringVariable,
>> RTL_TEXTENCODING_ASCII_US).getStr()
>
> This awful construct could be made much simpler if our strings were 
> always unicode (UTF-8/UTF-16/UTF-32).

I thought that OUString is UTF-16 and that that where the cause, not the 
solution of the conversion problems.

-Andre

>
>> Do you have more ideas?
>
> Using ideas from languages such as Python/Perl/Java for convenient and 
> powerful string processing to replace the awkward string handling that 
> is too often seen in our code base. E.g. having regexp enabled match() 
> or search() methods would be a great start.
>
> Herbert
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message