harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ellison <t.p.elli...@gmail.com>
Subject Re: Platform dependent code placement (was: Re: repo layout again)
Date Fri, 17 Feb 2006 13:47:05 GMT
Andrey Chernyshev wrote:
> Hi All,
> 
> Sorry for my late attempt to resurrect this thread, but I'm not sure
> if we've already came to a well-defined picture here:
> 
> On 1/4/06, bootjvm@earthlink.net <bootjvm@earthlink.net> wrote:
>>>> Some more platform tree names:
>>>>
>>>>     solaris32.sparc solaris64.sparc
>>>>     linux32.sparc linux64.sparc
>>>>     darwin32.ppc (Is this correct for the new MAC boxes?)
>>> Wouldn't the wordsize be better associated with the processor family?
>>>
>>> So  solaris.sparc32, windows.x86, linux.ppc32, and so on.
>> Better yet, keep the three variables independent because the
>> _same_ chip can operate in both a 32-bit and a 64-bit modes.
>> It is the operating system that works in one mode or the other.
>> Furthermore, a 64-bit OS also allows 32-bit applications to run
>> on it in a compatibility mode.  A good example of this is the
>> Sun JDK that runs on Solaris.  There is a 32-bit version, which
>> is the default, and there is an additional 64-bit module that
>> may be added on for running in 64-bit mode.  The same SPARC
>> processor is used for both.
>>
> 
> I think another example showing the benefit of having independent OS /
> CPU attributes could be scenario when we have, for instance, a JIT
> compiler which is producing nearly equal code for different OSes on
> IA32 (or whataver CPU), and, for example, file I/O module which
> doesn't  care a much about the specific CPU, but rather cares about
> the specific OS.  In other words, there will be likely the code which
> can be easily shared between the different CPU's, and the code which
> can be shared between the different OSes.

I agree (though I have no problem if the classlib / VM / JIT / GC / ...
subtrees come up with different ideas about how to solve the problem).

> On the other hand, having a separate source trees like linux32.sparc,
> solaris64.sparc, win.IA32 for each specific platform combination may
> lead to a huge code duplication. We may need to be able to share the
> code through the certain, but not through all platform combinations.

Agreed.  The existing code layout for the classlib natives is certainly
not a viable way to scale across multiple platforms.

(The 'in-house' mechanism for managing multi-platform code is particular
to IBM so not of great interest here, suffice to say that the win.IA32
and linux.IA32 trees in classlib/trunk/native-code are the product of
that mechanism with some manual tidy-up).

> To address that issue, I can suggest a pretty straightforward scheme
> for platform-dependent code placement which looks as follows:
> 
> 1. There is a fixed set of attributes which denotes a specific target
> configuration. As a starter set, we may have OS (for operating system)
> and, say ARCH (for architecture) attributes. This set can be extended
> later, but, as it was suggested, let's don't cross that bridge if we
> come to it.

Yes, the principal distinction is probably on OS & ARCH.

> 2. Files in the source tree are selected for compilation based on the
> OS or ARCH attribute values which may (or may not appear) in a file or
> directory name.
> Some examples are:
> 
> src\main\native\solaris\foo.cpp
>     - means file is applicable for whatever system running Solaris;

yep (that was foo.c, right ;-) -- only teasing)

> src\main\native\win\foo_ia32.cpp
>     - file is applicable only for  Windows / IA32;

why has the ARCH flipped onto the file name?  why not win_ia32 ?

> src\main\native\foo_ia32_em64t.cpp
>     - file can be compiled for whatever OS on either IA32 or EM64T
> architecture, but nothing else.

I agree with the approach, but left wondering why it is not something like:
   src\main\native\
                   common\
                   unix\
                   windows\
                   zos\
                   solaris\
                   solaris_x86\
                   solaris_sparc\
                   windows_ifp\

i.e. a taxonomy covering families of code (common, unix-like,
windows-like) and increasingly specific discriminators.

> The formal file selection rule may look like:
> 
> (1) File is applicable for the given OS value if its pathname contains regexp
> [\W_]${OS}[\W_], or pathname doesn't contain any OS value;
> 
> (2) File is applicable for the given ARCH value if its pathname contains regexp
> [\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value;
> 
> (3) File is selected for a compilation if it satisfies both (1) and
> (2) criteria.

If we restrict the OS and ARCH identifiers to directories then it will
allow us to use the gmake VPATH functionality to select the right file,
so compiling on solaris x86 will have a
VPATH='solaris_x86:solaris:unix:common' and so on.

> One can see that this naming convention gives developers enough
> freedom to layout their code in a most convenient way (actually,
> experience shows that the meaning of "convenient" may differ
> significantly depending on a component type :). On the other hand, it
> gives well defined (and hopefully intuitive enough) rule showing
> whether the particular file is picked up by the compiler or not,
> depending on a configuration.

I like the idea -- if we agree to use gmake throughout then I think we
get this functionality 'for free'.

> In addition to the above, the code could also be selected for
> compilation by means of #defines directives in C/C++ files (it is
> convenient when the most of a file is platform-independent, with the
> exception of just a few lines of code). The building system could set
> up the OS and ARCH attributes as appropriate defines for the C/C++
> code.
> For example, for Windows/IA32 config, the following defines could be set:
> 
>      #define OS WIN
>      #define WIN
>      #define ARCH IA32
>      #define IA32
> 
> Then the platform-dependent code sections may look like:
> 
> #ifdef WIN
> ….
> #endif
> 
> which is essentially same as:
> 
> #if OS == WIN
> ….
> #endif
> 
> It is important that OS/ARCH (or whatever additional) attribute names
> and values are used consistently in the file names and define
> directives.

Using the names consistently will definitely help, but choosing whether
to create a separate copy of the file in a platform-specific
sub-directory, or to use #define's within a file in a shared-family
sub-directory will likely come down to a case by case decision.  For
example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c
files, but a .h file that defines pointer types etc. may need different
versions of the entire file to keep things readable.

> Finally, I'd suggest that the platform dependent code can be organized
> in 3 different ways:
> 
> (1) Explicitly, via defining the appropriate file list. For example, 
> Ant xml file may choose either one or another fileset, depending on
> the current OS and ARCH property values. This approach is most
> convenient, for example,  whenever a third-party code is compiled or
> the file names could not be changed for some reason.

Ant ?!  ;-)  or platform-specific makefile #includes?

> (2) Via the file path naming convention. This is the preferred
> approach and works well whenever distinctive files for different
> platforms can be identified.

yep (modulo discussion of filenames vs. dir names to enable vpath)

> (3) By means of the preprocessor directives. This could be convenient
> if only few lines of code need to vary across the platforms. However,
> preprocessor directives would make the code less readable, hence this
> should be used with care.
> 
> In terms of building process, it means that the code has to pass all 3
> stages of filtering before it is selected for the compilation.

I like it.  Let's just discuss what tools do the selection -- but I
agree with the approach.

> The point is that components at Harmony could be very different,
> especially if we take into account that they may belong both to Class
> Libraries and VM world.

There will be files that it makes sense to share for sure (like vmi.h
and jni.h etc.) but they should be stable-API types that can be
refreshed across the boundary as required if necessary.

> Hence, the most efficient (in terms of code
> sharing and readability) code placement would require a maximum
> flexibility, though preserving some well-defined rules. The scheme
> based on file dir/name matching seems to be flexible enough.
> 
> How does the above proposal sound?

Cool, perhaps we can discuss if it should be gmake + vpath or ant.

Thanks for resurrecting this thread.

Regards,
Tim


>>> Maybe in some components we would want to include a window manager
>>> family too, though let's cross that bridge...
>>>
>>> I had a quick hunt round for a recognized standard or convention for OS
>>> and CPU family names, but it seems there are enough subtle differences
>>> around that we should just define them for ourselves.
>>>
>> My VM's config script maintains CPU type, OS name, and word size as three
>> independent values.  These are combined in various ways in the source code
>> and support scripts depending on the particular need.  The distribution script
>> names the 'tar' files for the binaries with all three as a part of the file name
>> as, "...-CPU-OS-WORD.tar" as the tail end of the file name.  (NB:  I am going
>> to simplify the distribution scripts shortly into a single script that creates the
>> various pieces, binaries, source, and documentation.  This will be out soon.)
>>
>> Does this help?
>>
>> Dan Lydick
>>
>>> Regards,
>>> Tim
>>>
>>>
>>> --
>>>
>>> Tim Ellison (t.p.ellison@gmail.com)
>>> IBM Java technology centre, UK.
>>
>>
>>
>> Dan Lydick
>>

-- 

Tim Ellison (t.p.ellison@gmail.com)
IBM Java technology centre, UK.

Mime
View raw message