db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-1482) Update triggers on tables with blob columns stream blobs into memory even when the blobs are not referenced/accessed.
Date Fri, 26 Mar 2010 22:06:27 GMT

     [ https://issues.apache.org/jira/browse/DERBY-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mamta A. Satoor updated DERBY-1482:

    Attachment: derby1482_patch1_stat.txt

I am attaching a patch, derby1482_patch1_diff.txt,  which is NOT READY FOR commit yet because
the required upgrade work for system table change has not been done yet. If this patch is
committed in it's current state, the older releases databases will fail with the trunk even
with hard upgrade. I plan to work on the upgrade work soon.

I ran derbyall and junit All suite with my patch. During the junite All suite run, I am getting
lots of errors in the upgrade suite. The failures are attached to this jira as junitUpgradeTestFailureWithPatch1.out
I think this probably is because of the new column that I have added to SYSTRIGGERS. I have
not investigated this failure enough to know for sure but I will be tackling the upgrade changes

Next, I am planning on focusing on following tasks
1)Implement Rick's suggestion of including the extra information we need about trigger in
existing column SYSTRIGGERS.REFERENCEDCOLUMNS This columns is of type ReferencedColumnsDescriptorImpl
and I can hopefully enhance that type to include the additional information. Hopefully this
approach will make it the next task easier which is to make upgrade changes.
2)Work on the upgrade code so this change in system table does not break older release databases
when they are run in soft/hard upgrade mode with the suggested work done through this jira.
3)Add more test cases in addition to what was added through TriggerTests.java

One thing to keep in mind is the improvement proposed by this patch will work only for UPDATE
triggers defined at row level AND they identify specific columns on which the trigger is defined.
	    FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
The trigger above is a row level UPDATE trigger and it identifies column "c12" as the triggering
column. For all other kinds of triggers(INSERT, DELETE, UPDATE trigger defined at table leve,
UPDATE trigger defined at row level but does not identify trigger columns), the improvement
proposed by this patch will not kick in. That is because all the other kinds of triggers do
not identify specific trigger columns. In the absence of specific trigger columns, Derby decides
to fetch in all the columns from the triggering table since it doesn't know what column it
might need to fetch at trigger execution time.

Following is the description of the existing behavior and changes made through my patch. It
is rather long but I wanted to provide as much information as I can since some part of the
code is little tricky.

I am going to work with following example to show what is being done by Derby today and what
changes I am suggesting.
	CREATE TABLE table1 (c11 int, c12 int, c13 int, c14 int, c15 int);
	INSERT INTO table1 VALUES(1,2,3,4,5);
	CREATE TABLE table2 (c21 int, c22 int, c23 int, c24 int, c25 int);
	INSERT INTO table2 VALUES(2,2,3,-1,5);
	    FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
In the update trigger tr1 above, the trigger column is c12 and trigger action references old/new
transition variable oldt.c14 Note that the trigger table, table1, has 5 columns and of those
only 2 columns are really needed by the trigger tr1. The trunk code converts the trigger action
UPDATE table2 SET c24=oldt.c14 to UPDATE table2 SET c24=org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getOldRow().getInt(4)
Note that oldt.c14 gets referenced through it's column position in triggering table table1.
Now let's consider the triggering statement
	UPDATE table1 SET c12 = -9 WHERE c11=1;
The trunk code is written to fetch all the 5 columns in trigger table table1 even though the
trigger is only in need of columns c12 and c14. If in our example, if table1 had a LOB column
with large amount of data, we would fetch that LOB column in memory too even though trigger
does not need that column.

With my patch, I am keeping track of references to old/new transition variables in trigger
action. Trunk code already keeps track of trigger columns in SYSTRIGGERS.REFERENCEDCOLUMNS.
The combination of the two will tell us exactly what columns are needed for trigger execution
from the triggering table and we fetch only those columns. For the trigger tr1, that would
mean column c12(which is the trigger column) and column c14(which is in the trigger action).
Because of this, columns c11, c13 and c15 will not be read into memory. If one or more of
those columns were LOB columns, that can result in lot of performance and memory consumption
improvement. My code will convert the trigger action UPDATE table2 SET c24=oldt.c14 
UPDATE table2 SET c24=org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getOldRow().getInt(2)
Note that oldt.c14 get referenced through it's column position is what columns are fetched
and not by it's column position in the trigger table table1. In our specific eg, we are fetching
column c12 and c14 and column c14 is in the 2nd position in that set. But when we make a note
of the trigger action columns in SYSTRIGGERS, we want to record it by it's column position
in the trigger table to know exactly what columns we are talking about. So, for our trigger
eg below, SYSTRIGGERS will look as follows
	    FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
from sys.systriggers;
TR1        |U    |R   |(2)              |(4)                 |update table2 set c24=oldt.c14
So, the column numbers to recognize the trigger columns and trigger action columns, we use
their position in triggering table. But to convert the trigger action to org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getxxxRow().getXXX(?),
for ?, we 
need to use the relative column position in the columns actually fetched from triggering table.
My apologies if this sounds confusing. In one of intermediate work on my machine, my code
didn't account for this difference in column position in table vs column position in the columns

actually fetched, and I started getting NPE because there is no column at position 4 since
we are in a need to fetch only 2 columns. This 2 different column numbering scheme makes the
code little tricky to follow but if we one understands the eg above, I think then it becomes
easier to understand the code. I will be happy to hear if someone has ideas on may be code
this differently. 

> Update triggers on tables with blob columns stream blobs into memory even when the blobs
are not referenced/accessed.
> ---------------------------------------------------------------------------------------------------------------------
>                 Key: DERBY-1482
>                 URL: https://issues.apache.org/jira/browse/DERBY-1482
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>            Reporter: Daniel John Debrunner
>            Assignee: Mamta A. Satoor
>            Priority: Minor
>         Attachments: derby1482_patch1_diff.txt, derby1482_patch1_stat.txt, derby1482DeepCopyAfterTriggerOnLobColumn.java,
derby1482Repro.java, derby1482ReproVersion2.java, junitUpgradeTestFailureWithPatch1.out, TriggerTests_ver1_diff.txt,
> Suppose I have 1) a table "t1" with blob data in it, and 2) an UPDATE trigger "tr1" defined
on that table, where the triggered-SQL-action for "tr1" does NOT reference any of the blob
columns in the table. [ Note that this is different from DERBY-438 because DERBY-438 deals
with triggers that _do_ reference the blob column(s), whereas this issue deals with triggers
that do _not_ reference the blob columns--but I think they're related, so I'm creating this
as subtask to 438 ]. In such a case, if the trigger is fired, the blob data will be streamed
into memory and thus consume JVM heap, even though it (the blob data) is never actually referenced/accessed
by the trigger statement.
> For example, suppose we have the following DDL:
>     create table t1 (id int, status smallint, bl blob(2G));
>     create table t2 (id int, updated int default 0);
>     create trigger tr1 after update of status on t1 referencing new as n_row for each
row mode db2sql update t2 set updated = updated + 1 where t2.id = n_row.id;
> Then if t1 and t2 both have data and we make a call to:
>     update t1 set status = 3;
> the trigger tr1 will fire, which will cause the blob column in t1 to be streamed into
memory for each row affected by the trigger. The result is that, if the blob data is large,
we end up using a lot of JVM memory when we really shouldn't have to (at least, in _theory_
we shouldn't have to...).
> Ideally, Derby could figure out whether or not the blob column is referenced, and avoid
streaming the lob into memory whenever possible (hence this is probably more of an "enhancement"
request than a bug)... 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message