db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oystein.Grov...@Sun.COM (Øystein Grøvlen)
Subject Re: Optimal tuple memory representation
Date Wed, 04 May 2005 10:01:00 GMT
>>>>> "JM" == Jean Morissette <jean.morissette666@videotron.ca> writes:

    JM> If you could recreate Derby, what would be the more globaly
    JM> performant tuple memory representation (byte[], ByteBuffer,
    JM> offet in a byte[]/ByteBuffer, java object, ...) that you would
    JM> choose?

    JM> I'm wondering if creating java object for each tuple and let
    JM> the gc do its work would be more performant than having a
    JM> reusable ByteBuffer that contains many raw tuples?  What do
    JM> you think?

The main point here is that both the Java representation and the byte
representation is needed.  You need the byte representation to write
the data to disk and to transfer it over the network, and you need a
Java type each time you want to access the data from a Java program.
You also need the byte representation for the disk representation of
the log.

The question will then be when to do the conversion between the two
formats; when accessed by the application or when writing it to disk or
transferring over the network.  Also note that it is possible to move
tuple and data values between disk pages, log, and network buffers
without converting them into Java types.  

For hot-spot data, which always reside in memory buffers, it may be
more optimal to store the data in java objects, especially if the
database is embedded in an application or most data access is by Java
stored procedures.  However, for updates you would still have to
generate the byte representation for the log record.  Maybe it would
make sense to cache the java object for hot-spot data in addition to
the storing it in disk pages?

For infrequently accessed data, each tuple would have to be converted
on every access since it would normally not reside in memory.  As
Mike's example illustrates this could cause a lot of object creation
for queries on large data volumes.  You would typically also convert
column values that the query does not need.


View raw message