db-jdo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Russell <Craig.Russ...@Sun.COM>
Subject Re: Embedded collections of non-PC objects
Date Mon, 18 Jul 2005 17:57:48 GMT
Hi Andy,

I find this discussion very useful and appreciate your insights as well.

On Jul 17, 2005, at 1:23 AM, Andy Jefferson wrote:

> Hi Craig,
>
> thanks for your reply and your insights.
>
>
>>> Example 1 : Collection of BigDecimal
>>> 1. Basic collection
>>> <field name="myfield">
>>>     <collection element-type="java.math.BigDecimal"/>
>>>     <join/>
>>> </field>
>>> This creates 2 tables - 1 for the class owning "myfield", and 1
>>> join table to
>>> contain the elements. If <join> is omitted then an error should be
>>> thrown
>>> (though i'm not sure if JPOX currently flags this up)
>>>
>>
>> The join element has no defaults, so this is not sufficient to
>> describe the mapping. You need at least a column attribute naming the
>> join column. And you need to name the column in the join table to map
>> the BigDecimal values to. So,
>>
>> <field name="myfield" column="VALUES" table="MYFIELD_TABLE">
>>      <collection element-type="java.math.BigDecimal"/>
>>      <join column="JOIN_COLUMN"/>
>> </field>
>>
>
> I don't necessarily agree here. We have to qualify the statement  
> with the
> following
> New schema : The JDO impl is perfectly capable of providing default  
> namings
> for columns and perfectly capable of choosing the join columns ...  
> since it
> has the (PK) columns in the main table. It provides default namings  
> for
> columns in other situations.
> Existing schema : The user should specify the columns and table as  
> you stated.
>
We discussed this in the expert group in the past, and came to the  
conclusion that default mappings are too much work, considering the  
range of implementations and data stores out there. We could spent  
lots of time coming up with rules for defaults but felt that it was  
not worth our time.

That said, we don't require an implementation to throw an exception  
if it encounters an incomplete mapping. But we do require that an  
implementation support a completely specified mapping, and that's  
what the TCK will do.
>
>
>>> embedded-element has no effect with this example because the element
>>> (BigDecimal) is already embedded (in the join table), and has no
>>> way of not being embedded.
>>>
>>
>> I'd say that serialized implies embedded-element (and vice-versa,
>> which is why I'm now questioning the value of embedded-element as an
>> attribute).
>>
>
> OK. That wasn't my interpretation. I see 2 levels of embedded. We have
> "embedded" at field level, and "embedded-element" (or -key, -value) at
> collection/map level. I see embedded-element/key/value as saying  
> that we want
> to embed the elements/keys/values in a join table (like in the  
> example 9 in
> the spec), and embedded (at field level) saying that we want to  
> embed the
> whole collection/map into the main table.
>
Well, we don't need the "embedded" term to describe whether we have a  
join table or not. That is accomplished simply by the existence of  
the join element. To restate, if there is a join element in the field  
metadata, then the field's data is contained in a join table. This is  
independent of the issue of whether the data is embedded or not.

It seems that the "embedded" term is almost universally  
misunderstood, and I need to do a better job of explaining it.

Embedded as applied to PC types means that the persistent fields in  
the PC are stored as columns in the row individually. Which row  
depends on whether there is a join in the field metadata. This is  
where Example 9 is instructive, and I believe the example is correct.  
If serialized is specified as true, then the entire PC instance is  
serialized and stored in one column.

Embedded as applied to Collection/Array/Map types is always true for  
relational data stores. If serialized is specified to true, then the  
entire instance is serialized and stored in one column. A join table  
is commonly used to store each element (or entry) in one row of the  
join table. Then the embedded-element, embedded-key, and embedded- 
value can be used to determine how the elements or entries themselves  
are stored.

Somewhat of a digression: I don't know why we need embedded-element,  
embedded-key, and embedded-value, since we have the <embedded>  
element that is nested inside the <element>, <key>, and <value>  
elements.

I'll add some examples of this to the spec, probably in Chapter 15.
>
>
>>> 3. Embedded element
>>> <field name="myfield">
>>>     <collection element-type="MyElement" embedded-element="true"/>
>>>     <join/>
>>> </field>
>>> This creates 3 tables - 1 for the class owning "myfield", and 1
>>> join table
>>> containing the elements (columns aligned with the fields of the PC
>>> element).
>>>
>>
>> Since it's embedded-element, I think there is only one table that  
>> contains
>> all the fields in the class, including the Collection of MyElement.
>>
>> You can't map the columns of an embedded Collection of PC elements
>> because you would need one column for each field in each PC, which is
>> a variable number of columns. And tables have a fixed number of
>> columns. So the mapping has to either serialize the Collection and
>> store it into a BLOB column or use another table. For embedded
>> collection,
>>
>
> As my comment above, I didn't interpret (embedded-element/key/ 
> value) like
> this. I'll try to justify this, with a map this time :-) ...
> 1. We have a map with embedded-key=false, embedded-value=false.  
> This ends up
> with a main table, and a join table, and optionally (if the keys/ 
> values are
> PC), tables for key and value. No disagreement there.

Almost. Just to emphasize, there is one row in the join table for  
each Map.Entry consisting of a key and a value and a reference back  
to the primary table for the class. If the key is an Integer, Long,  
Short, etc., it is actually an embedded-key in the join table row.  
And this is the default for Integer.
>
> 2. We have a map with embedded-key=true, embedded-value=false. How  
> would you
> store these ?

> Would you have a BLOB column for the map keys, and have the map
> values stored off in their own table (since they aren't embedded) ?

Depends on the type of the key. If it is a PC type, embedded-key true  
means that the key is mapped to possibly multiple columns in the join  
table row. If it is an Integer, then embedded-key is the default, and  
means the "normal" mapping of using a column in the join table to  
store the Integer value.

Embedded-value false means that the values are stored elsewhere (like  
in the table containing the extent of the PC instances). And the only  
thing in the join table row is a foreign key to the primary table of  
the PC type.

> This
> would make managing the map a bit tricky for the JDO impl (to say the
> least!). I would store the keys as embedded into the join table (as  
> per
> example 9 in the spec - multiple columns in the join table lining  
> up with the
> fields in the key), and have the values in their own table (if PC)  
> with a FK
> from the join table. This makes it simple for the JDO impl to  
> manage the map
> since the keys and values are stored in the join table.


>
> 3. We have a map with embedded-key=true, embedded-value=true. How  
> would you
> store this ?

If the map were a Map<Integer, Address>, embedded-key true means to  
store the key in the join table row, and embedded-value true means to  
store the value in multiple columns of the same join table row.

> Would you store them as a single BLOB column in the main table.
> I would store the key AND value in the join table (as per example 9  
> in the
> spec - so we gain columns for key, and columns for value in the  
> join table).

Use serialized true for the field itself to get this behavior, and  
not use embedded-key or embedded-value at all.
>
> As for your point above about variable number of columns, well  
> example 9 in
> the spec is just this case. It embeds the element of a collection  
> into a join
> table. This is "embedded" because the elements are not stored as  
> FCO's - they
> are embedded into the join table (which is effectively a secondary  
> table
> owned by the main table - and which represents the collection).

This is actually embedded-element true as well as embedded true. The  
join element indicates that the values are stored outside the primary  
table.
>
> Now if the user specifies <field embedded="true"> then I would  
> expect to have
> to store the whole collection/map as a single BLOB column (like  
> serialized,
> so why do we have a serialized attribute too ?)
>
> I'd welcome any clarification from the people in the know who spent  
> a lot of
> time designing these levels of specification, and from the JDO  
> vendors that
> have supported such embedding for some time on what we interpret these
> attributes as.
>
>
>> It's been on my to-do list for the specification for a while to add
>> mapping for arrays, lists, sets, and maps to Chapter 15. This might
>> be the time to actually do it.
>>
>
> That would be great. The spec covers many many situations and the  
> metadata is
> largely intuitive as to what people specify, but I feel we're missing
> clarification on this one part.

Agree.

Craig
>
>
> Thanks!
> -- 
> Andy
> Java Persistent Objects - JPOX
>

Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!


Mime
View raw message