harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Yershov" <dmitry.yers...@gmail.com>
Subject Re: [drlvm] proposals for VM internationalization
Date Thu, 20 Jul 2006 09:42:25 GMT
Hello all.

Salikh Zakirov wrote:
> far below are results of my experiments with Log4cxx's ResourceBundle.
> (I've managed to find it in Log4cxx documentation after carefully
> rereading your original post).
>
> The good news is that it does localization (severely limited).
> The prototype has following good properties
> * The unlocalized message is used as the message key

The message key should be "message pattern" (not a message), because
some parameters may be in this message. E.g.:

"Message pattern with integer parameter: %d"
or
"Message pattern with one parameter: {0}"

> * No extra entities were introduced (like non-printable message keys)

What about very long "message pattern" (e.g. see help message from
VM)? For these cases "messageId key" should be used.

> * The localizable messages are marked by _() notation and can
> be extracted from the source code automatically

To my mind solution is graceful for localizable messages extraction.
But should we care about this? Once, these messages should be gathered
and put into properties file.

I propose the following solution:

Modify VM's LoggerString class. The first parameter of composite
message should be message key. If it equals empty string then the
message should not be localized. E.g.:
WARN("" << "Not localizable message with two parameters: " << 1 << "and"
<< 10)
WARN("localizable message with two parameters: %d and %d" << 1 << 10)
or
WARN("localizable message with two parameters: {0} and {1}" << 1 << 10)

>
> The things that I have not implemented yet (to save time and make at least
> something available):
> * loading the system locale value
> * reading the locale-specific localization file
> * converting the localized messages to locale-specific encoding

What do you mean there?

> * converting the unlocalized messages from source encoding (US-ASCII) to
> UTF-16 (wchar_t[])

There is big question. We can use there char[] strings. Log4cxx
automatically converts char* to wchar_t*.
Also, we can use utf8 coding for wide characters.

>
> The issues that I have encountered but haven't yet worked out a solution:
>
> * PropertyResourceBundle.getString().c_str() returns the
> pointer to the stack
> location. To make it work, I had to use wcsdup(), thus introducing
> an unacceptable memory leak.
> I think there must be some way to get the pointer to original bundle
> contents,
> but haven't figured out how to achieve it.
>

May be that's the way:

LOG4CXX_DECODE_WCHAR(chstr, wchrstr);
LOG4CXX_ENCODE_CHAR(charstr, chstr);
charstr.c_str()

> * PropertyResourceBundle expects the good property format, so the
> unlocalized
> messages needs to be mangled to property-compatible form
> (in the patch below, the only transformation replaced spaces ' ' with
> underscores '_',
>   but it needs to be generalized).

I agree with you.

>
> Given the number of issues PropertyResourceBundle introduces, and the number
> of
> services it provides (parsing property-format and constructing in-memory
> hashmap),
> I think that it would be easier to reimplement the functionality without
> using PropertyResourceBundle,
> and change the storage on-disk file format to allow unmangled messages be
> the keys.
>

In conclusion there are my suggestions for VM's internationalization:

1. Extend log4cxx::helpers::PropertyResourceBundle class which should
allow lazy (on demand) load of properties.
2. Extend log4cxx::helpers::Properties class to allow string with
spaces as a key.
3. Choose model:
    a. _("<message key>") – localizable ; "<message>" – not localizable
    b. "<message key>" – localizable ; "" – not localizable.
4. Decide between two variants: printf format specifications or
{<number>} should be used inside message pattern for parameters.

Thanks Dmitry.

> ===============================================
> From: Salikh Zakirov <Salikh.Zakirov@Intel.com >
> Date: Thu, 13 Jul 2006 12:06:05 +0400
> Subject: [PATCH] Dummy l10n implemenation based on Log4cxx
> ---
> vm/include/l10n.h              |   31 +++++++++++++++++++
> vm/port/include/loggerstring.h |    9 +++++
> vm/vmcore/src/init/l10n.cpp    |   66
> ++++++++++++++++++++++++++++++++++++++++
> vm/vmcore/src/init/vm_main.cpp |    2 +
> 4 files changed, 108 insertions(+), 0 deletions(-)
>
> diff --git a/vm/include/l10n.h b/vm/include/l10n.h
> new file mode 100755
> index 0000000..bb3edfe
> --- /dev/null
> +++ b/vm/include/l10n.h
> @@ -0,0 +1,31 @@
> +#ifndef _L10N_H
> +#define _L10N_H
> +
> +#include <string>
> +#include <log4cxx/helpers/propertyresourcebundle.h>
> +#include <log4cxx/helpers/exception.h>
> +#include <wchar.h>
> +#include "cxxlog.h"
> +
> +extern log4cxx::helpers::ResourceBundlePtr
> l10n_resource_bundle;
> +
> +inline const wchar_t* _(const wchar_t* message)
> +{
> +    if (!l10n_resource_bundle) return message;
> +    try {
> +        wchar_t* mangled = wcsdup(message);
> +        wchar_t* c = mangled;
> +        while (*c) {
> +            if (*c == L' ') *c = L'_';
> +            c++;
> +        }
> +        std::wstring & localized =
> l10n_resource_bundle->getString(mangled);
> +        free(mangled);
> +        return wcsdup(localized.c_str()); // FIXME: leak
> +    } catch (log4cxx::helpers::MissingResourceException &)
> {}
> +    return message;
> +}
> +
> +void init_l10n();
> +
> +#endif // _L10N_H
> diff --git a/vm/port/include/loggerstring.h
> b/vm/port/include/loggerstring.h
> old mode 100644
> new mode 100755
> index 1efe5d2..1eae5c1
> --- a/vm/port/include/loggerstring.h
> +++ b/vm/port/include/loggerstring.h
> @@ -41,6 +41,15 @@ public:
>         return (const char*)logger_string.c_str();
>     }
>
> +    LoggerString& operator<<(const wchar_t* message) {
> +        const wchar_t* c = message;
> +        while (*c) {
> +            logger_string += (char)*c;
> +            c++;
> +        }
> +        return *this;
> +    }
> +
>     LoggerString& operator<<(const char* message) {
>         logger_string += message;
>         return *this;
> diff --git a/vm/vmcore/src/init/l10n.cpp b/vm/vmcore/src/init/l10n.cpp
> new file mode 100755
> index 0000000..c8fd746
> --- /dev/null
> +++ b/vm/vmcore/src/init/l10n.cpp
> @@ -0,0 +1,66 @@
> +#include <apr_env.h>
> +#include <assert.h>
> +#include <fstream>
> +#include <string.h>
> +
> +#include "cxxlog.h"
> +#include "l10n.h"
> +#include "platform_lowlevel.h"
> +
> +#include <log4cxx/helpers/locale.h>
> +
> +using namespace log4cxx;
> +using namespace log4cxx::helpers;
> +
> +ResourceBundlePtr l10n_resource_bundle;
> +
> +void init_l10n()
> +{
> +    INFO2("info", "starting l10n initialization");
> +
> +    /*
> +    apr_pool_t *pool;
> +    apr_pool_create(&pool, 0); assert(pool);
> +    char *lang = NULL;
> +
> +    apr_env_get(&lang, "LANG", pool);
> +    if (!lang) lang = "C";
> +
> +    char *encoding = strchr(lang,'.');
> +    if (encoding != NULL) {
> +        *encoding = '\0';
> +        encoding += 1;
> +    }
> +    char *region = strchr(lang,'_');
> +    if (region != NULL) {
> +        *region = '\0';
> +        region += 1;
> +    }
> +    INFO2("info", "lang = " << lang << ", " << "region = " <<
region
> +            << ", encoding = " << encoding);
> +    string filename = "drlvm_";
> +    assert(lang);
> +    filename += lang;
> +    if (region) {
> +        filename += "_";
> +        filename += region;
> +    }
> +
> +    INFO2("info", "filename = " << filename.c_str());
> +    //FIXME: read the localization file
> +    */
> +
> +    std::wstring properties = L"message_1=SOOBSCHENIE
> 1\nmessage=SOOBSCHENIE";
> +    PropertyResourceBundle* bundle =
> +        new PropertyResourceBundle(properties);
> +    INFO2("info", "bundle loaded (" << bundle << ")");
> +    assert(bundle);
> +
> +    // _() can only be used after this initialization is done
> +    l10n_resource_bundle = bundle;
> +
> +    INFO2("info", _(L"message"));
> +    INFO2("info", _(L"message 1"));
> +    INFO2("info", _(L"message 2"));
> +    //apr_pool_destroy(pool);
> +}
> diff --git a/vm/vmcore/src/init/vm_main.cpp
> b/vm/vmcore/src/init/vm_main.cpp
> old mode 100644
> new mode 100755
> index e03e674..7378403
> --- a/vm/vmcore/src/init/vm_main.cpp
> +++ b/vm/vmcore/src/init/vm_main.cpp
> @@ -42,6 +42,7 @@ #include "dll_jit_intf.h"
> #include "dll_gc.h"
> #include "em_intf.h"
> #include "port_filepath.h"
> +#include "l10n.h"
>
> union Scalar_Arg {
>     int i;
> @@ -559,6 +560,7 @@ static void destroy_vm(Global_Env *p_env
> VMEXPORT int vm_main(int argc, char *argv[])
> {
>     init_log_system();
> +    init_l10n();
>
>     char** java_args;
>     int java_args_num;
> --
> 1.4.1.g4b86
>
>
>
>
>
> ---------------------------------------------------------------------
> Terms of use :
> http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Mime
View raw message