db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-1482) Update triggers on tables with blob columns stream blobs into memory even when the blobs are not referenced/accessed.
Date Tue, 23 Feb 2010 17:50:28 GMT

     [ https://issues.apache.org/jira/browse/DERBY-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mike Matrigali updated DERBY-1482:

> @Mike (16/Feb/10):
> I don't have a write-up, and I'm not sure if the code will handle the scenario
s you mention. I don't fully understand the details of the store and the trigger
 code in the case of inserts/updates of long columns.
My comments apply on to the store side of the streaming blobs.

I am not exactly sure where the blobs are coming from in the sql layer.  I
think there may be some defered update tables, so the question is if the
streams are from the original table original table or a deferred update
> Can you help me understand your comment about the changing BLOB value:
> "But the stored
> value of the blob could be changed before we access it, so a a stream pointer
> the original blob is not going to be valid. "
> Who / what may change it;
>  - the current transaction's update query?
>   - the current transaction's trigger code?
>    - other transactions?

I believe both the first 2 are possibilities.  Locking should prevent any
changes to the affected row by other transactions.  So there are 2 cases:

1) the current tranaction statement that causes the trigger may change the

   insert - value in store table could be used to stream
   delete - value in store table won't be avaliable after the delete
   update - value in store table may be changed by the update itself

2) the trigger action code might change the blob.

   This case is more obscure and needs to happen in a procedure.  And the
   only way to use the old and new values is to pass them into the procedure,
   and currently our procedures don't support blobs as arguments.
>    That said, I do see that we are having trouble dealing with "inbound stream
s" when they have to be used several times - our only tool there is to materiali
ze the stream into memory. One extension would be to write data to disk temporar
ily to avoid OOME, but I'm not sure if we can manage this without introducing a
performance degradation.
>    The better option is probably to just write the value to the log and then r
ead it back from the log as required. Do you have any idea about how much effort
 that would take? I'm not suggesting that it should be done now, but it would be
 nice to have an idea about how difficult it is.
That sounds "hard" to me.  I would definitely not want to see the sql layer
accessing the log.  The clean separation of sql layer from the store layer
has greatly helped the reliablity of the product.  Also timing of it might
be a problem as the current trigger code may need a second access to a stream
before the actual value has been logged.  Again I think the key here may be
in understanding the deferred update tables and how they apply to all
the matrix of before/after insert/delete/update triggers.

You also raise an interesting point about performance.  There are definitely
a set of applications out there that are just fine with us temporarily
instantiating copies of blobs into memory. Where our stream threshold is
a max of 32k, 2 gig of memory is very likely available.  So there should be
some discussion before any change goes in that would automatically stream
something to disk that currently uses in memory.

> Update triggers on tables with blob columns stream blobs into memory even when the blobs
are not referenced/accessed.
> ---------------------------------------------------------------------------------------------------------------------
>                 Key: DERBY-1482
>                 URL: https://issues.apache.org/jira/browse/DERBY-1482
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>            Reporter: Daniel John Debrunner
>            Assignee: Mamta A. Satoor
>            Priority: Minor
>         Attachments: derby1482DeepCopyAfterTriggerOnLobColumn.java, derby1482Repro.java,
derby1482ReproVersion2.java, TriggerTests_ver1_diff.txt, TriggerTests_ver1_stat.txt
> Suppose I have 1) a table "t1" with blob data in it, and 2) an UPDATE trigger "tr1" defined
on that table, where the triggered-SQL-action for "tr1" does NOT reference any of the blob
columns in the table. [ Note that this is different from DERBY-438 because DERBY-438 deals
with triggers that _do_ reference the blob column(s), whereas this issue deals with triggers
that do _not_ reference the blob columns--but I think they're related, so I'm creating this
as subtask to 438 ]. In such a case, if the trigger is fired, the blob data will be streamed
into memory and thus consume JVM heap, even though it (the blob data) is never actually referenced/accessed
by the trigger statement.
> For example, suppose we have the following DDL:
>     create table t1 (id int, status smallint, bl blob(2G));
>     create table t2 (id int, updated int default 0);
>     create trigger tr1 after update of status on t1 referencing new as n_row for each
row mode db2sql update t2 set updated = updated + 1 where t2.id = n_row.id;
> Then if t1 and t2 both have data and we make a call to:
>     update t1 set status = 3;
> the trigger tr1 will fire, which will cause the blob column in t1 to be streamed into
memory for each row affected by the trigger. The result is that, if the blob data is large,
we end up using a lot of JVM memory when we really shouldn't have to (at least, in _theory_
we shouldn't have to...).
> Ideally, Derby could figure out whether or not the blob column is referenced, and avoid
streaming the lob into memory whenever possible (hence this is probably more of an "enhancement"
request than a bug)... 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message