polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niclas Hedhman <nic...@hedhman.org>
Subject Re: JPA?
Date Wed, 21 Sep 2016 13:01:48 GMT
Thanks a lot Stan for the overview picture, which is sometimes hard to
acquire by looking at the code.

On this topic, there are a few "constraints" that perhaps should be
considered as Core Zest features somehow, that could make life easier when
insisting on RDBMS backing an Zest application. After all, most of my
Entities end up having a static data structure, and perhaps this fact can
aid implementation of both storage and indexing.

1. There is a big difference between;
   a. SQL Schema is completely foreign, and Zest developer wants to read
that into Entities.
   b. Zest owning the schema, optimized for performance
   c. Zest owning the schema, optimized for interoperability over SQL data.

2. For 1a. above, I think it is reasonable to have some restrictions on
what is possible. For instance, we could create an interactive tool that
queries the schema from a database, allows people mark what are Entities
(and their friendlier name) and to tie together Association fields, and
many-to-many tables as ManyAssocations. Once such a tool generates the
EntityTypes they are set in stone, and one can't become fancy and introduce
supertypes that crosses tables and such advanced features.

3. All state that is not exposed by an EntityType is still expected to be
preserved if an Entity of that EntityType is updated. The trick is that
some EntityTypes overlap and many don't, and currently there is no way to
know when EntityTypes overlap and need to share a table for
Property/Association. IF that were to be introduced somehow, then one table
per over-arching entity typo could be created with a column per
Property/Association, and possibly even manage a separate table for each of
ManyAssoc/NamedAssoc dynamically. I think that in reality that is always
the case, both for myself and what I remember from StreamFlow. The problem
is that some small subtypes will cross tables, and it is this feature that
we could decide to prevent for RDBMS used as 1c above.

4. For 1b above, the existing EntityStore might be the fastest option and
for some people that is reasonable choice. However, data lives for long,
and we should in that case make sure that all the metadata that is in the
Zest type system, is also stored alongside this compact, serialized format,
and that we have a generic tool that can read 1b storage type and create a
1c output, in case people change their mind, throw away Zest or other
unforeseen circumstances.

I hope you all get a feel for my thinking here... The default Zest type
system is way too "cool" for RDBMS, but we could introduce some limitations
to better support RDBMS in an enterprise setting.

Niclas

On Wed, Sep 21, 2016 at 5:59 PM, Stanislav Muhametsin <
stanislav.muhametsin@zest.mail.kapsi.fi> wrote:

> On 21.9.2016 12:00, Stanislav Muhametsin wrote:
>
>> On 21.9.2016 0:08, Jiri Jetmar wrote:
>>
>>> Independently of that that, things starts to be complicated in the SQL
>>> world with large data when you are submitting e.g. a inner JOIN statement
>>> in a transactional INSERT expression, where the tables are located  (the
>>> data) on different nodes, simply because of a single node limit.
>>>
>>
>> Well... there are ways to circumvent this, too. :)
>> Unfortunately, can't tell much more about that, as it is key concept of
>> the company where I work right now.
>> But very large and globally distributed RDBMS nodes, which don't lag
>> under high stress, are very much so possible!
>>
>> I would like to work on this task. Therefore smart ideas are highly
>>> welcome
>>> ! :-)
>>>
>>
>> I am responsible for current SQL indexing code, and Paul is responsible
>> for current SQL entitystore code.
>> The indexing code is, unfortunately, quite spaghettified, and I would do
>> lots of things probably very differently now, than how I did them several
>> years ago.
>>
>> I think one strategy to approach would be to re-write whole thing from
>> scratch, but keeping the main principles:
>> 1. Each property and association should be stored in separate table. This
>> is required for the very dynamic nature of Zest queries.
>>     This implicates that the whole DB schema will be owned by this
>> indexing service - it is not compatible with some external tool generating
>> tables (unless tables are, of course, in this exact required format).
>>
>>     Example: if you have interface MyEntity { Property<String> myProp();
>> Association<SomeOtherEntity> myAsso(); },
>>     That would end up with 3 tables:
>>     1.1. The table for implicit 'identity' property. Current indexing
>> code has separate DB-specific entity primary key which is of type 'long'
>> (64bit integer), since that makes foreign key references much faster and
>> compacter. Not sure if that is good idea anymore.
>>     1.2. The table for 'myProp' property with 2 columns: foreign key
>> reference to identity table and actual column of type 'text' containing the
>> contents of 'myProp' property.
>>     1.3. The table for 'myAsso' association with 2 columns: foreign key
>> reference to identity table of 'MyEntity' and foreign key reference to
>> identity table of 'SomeOtherEntity'. Obviously this covers the
>> 'ManyAssociation's as well, if they are still present in Zest.
>>
>
> I forgot one thing to mention: the collectionized properties were cause of
> major headache in SQL indexing.
> IIRC I made separate tables and/or columns to model collections properly,
> in order to properly support queries which had conditions like "property X
> is list containing entity Y with property Z at value Ö".
>
> However, with the JSON datatype, it might be better (?) idea to make
> collectionized properties as single column with JSON datatype, and just
> JSON-ize collections.
>
>
>
>> 2. The SQL query will then be created by translating Zest query objects
>> into SQL query objects. You could also just build SQL strings, but it would
>> be quite nasty and most likely very unmaintaineable task - current code
>> uses java-sql-generator, which is my library on github, to create SQL
>> statements using DSL, and not by concatenating strings.
>>     IIRC Zest AND-conditions are SQL INTERSECTIONs of single
>> property/asso queries, Zest OR-conditions are SQL UNIONs of single
>> property/asso queries, and Zest NOT-conditions are SQL NOTs of single
>> property/asso queries.
>>
>> 3. Keeping DB schema in sync with Zest application schema.
>>     This is very tough one - if someone adds/removes
>> properties/associations, the database schema should be kept up-to-date.
>>     I remember having problems with this one - it also implies that you
>> have to walk through the *whole* application structure in order to generate
>> tables for all the properties/assos of all the entities visible to this
>> indexing service.
>>
>>
>>
>


-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message