ofbiz-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierre Smits (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
Date Tue, 17 Mar 2020 08:48:00 GMT

    [ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032320#comment-17032320
] 

Pierre Smits edited comment on OFBIZ-10953 at 3/17/20, 8:47 AM:
----------------------------------------------------------------

The reasoning we applied to having a different primary key for the Uom table in the transaction
db (an exception to the primary key definition for majority of the tables) applies also to
the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the starting point
for the project's approach to Bi through the same named component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across businesses, e.g.
those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and all those other
groups of records which are relating to records in the Uom table in the transaction db are
such generic tables. These tables don't need a 'self defined' primary keys added, because
the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use the ISO 8601
(specifically YYMMDD) for the date as the definition for the primary key. And the same is
suggested for the Time dimension table (HHMM in that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29
10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001" extDiscountAmount="0.000"
extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000"
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" *invoiceDateDimId="20151008"*
invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29
10:03:24.0" orderId="_NA_" organisationDimId="10010" *origCurrencyDimId="EUR"* productDimId="10087"
quantity="1.000000" quantityUomDimId="10"/>
{code}
vs
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29
10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001_" extDiscountAmount="0.000"
extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000"
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" *invoiceDateDimId="10000"*
invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29
10:03:24.0" orderId="_NA_" organisationDimId="10010" *origCurrencyDimId="10000"* productDimId="10087"
quantity="1.000000" quantityUomDimId="LEN_m"/>
{code}
The latter means that in a production infrastructure the using company is penalised (performance
and cost-wise) with an additional query (for the *origCurrencyDimId*) to the currency dimension
to retrieve the underlying explanation/meaning (EUR). And similarly for the *quantityUomDimId*
and other generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in fact tables.
But when there a multitude of such fact tables with each having millions/billions of records
in those tables using such keys of generic tables (these measurement tables), then it adds
up significantly. 


was (Author: pfm.smits):
The reasoning we applied to having a different primary key for the Uom table in the transaction
db (an exception to the primary key definition for majority of the tables) applies also to
the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the starting point
for the project's approach to Bi through the same named component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across businesses, e.g.
those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and all those other
groups of records which are relating to records in the Uom table in the transaction db are
such generic tables. These tables don't need a 'self defined' primary keys added, because
the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use the ISO 8601
(specifically YYMMDD) for the date as the definition for the primary key. And the same is
suggested for the Time dimension table (HHMM in that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29
10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001" extDiscountAmount="0.000"
extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000"
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" invoiceDateDimId="20151008"
invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29
10:03:24.0" orderId="_NA_" organisationDimId="10010" origCurrencyDimId="EUR" productDimId="10087"
quantity="1.000000" quantityUomDimId="10"/>
{code}
vs
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29
10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001_" extDiscountAmount="0.000"
extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000"
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" invoiceDateDimId="20151008"
invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29
10:03:24.0" orderId="_NA_" organisationDimId="10010" origCurrencyDimId="10000" productDimId="10087"
quantity="1.000000" quantityUomDimId="LEN_m"/>
{code}
The latter means that in a production infrastructure the using company is penalised (performance
and cost-wise) with an additional query (for the *origCurrencyDimId*) to the currency dimension
to retrieve the underlying explanation/meaning (EUR). And similarly for the *quantityUomDimId*
and other generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in fact tables.
But when there a multitude of such fact tables with each having millions/billions of records
in those tables using such keys of generic tables (these measurement tables), then it adds
up significantly. 

> have CurrencyDimension have a dimensionId that is based on the natural key
> --------------------------------------------------------------------------
>
>                 Key: OFBIZ-10953
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-10953
>             Project: OFBiz
>          Issue Type: Sub-task
>          Components: bi
>    Affects Versions: Release Branch 18.12, Release Branch 17.12, Trunk
>            Reporter: Pierre Smits
>            Assignee: Pierre Smits
>            Priority: Major
>              Labels: CurrencyDimension, birt, currency, dimension, dwh
>         Attachments: 20200208_094840.jpg
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine the dimensionId
for the CurrencyDimension. This is unnecessary as the uomId from the UOM table can be used
for currency.
> It also makes it easier to set the foreign-key in fact tables by generating it based
on the date provided, than by retrieving the dimensionId based on a retrieval through the
getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message