db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <rick.hille...@oracle.com>
Subject Re: bulk insert
Date Tue, 31 Aug 2010 13:09:01 GMT
Mike Matrigali wrote:
> Rick Hillegas wrote:
>> Mike Matrigali wrote:
>>> Rick Hillegas wrote:
>>>> Mike Matrigali wrote:
>>>>> Rick Hillegas wrote:
>>>>>> Hi Mike,
>>>>>>
>>>>>> Thanks for the quick response. Some comments inline...
>>>>>>
>>>>>> Mike Matrigali wrote:
>>>>>>> I would vote -1 on this, if the proposal is to allow unlogged

>>>>>>> inserts
>>>>>>> into non-empty tables.  I do not want to add syntax that basically
>>>>>>> allows users to silent corrupt their db.
>>>>>> I poorly described what I meant. I am  not suggesting that we 
>>>>>> invent a new mechanism for bulk insert. I'm merely suggesting 
>>>>>> that we re-use the existing bulk insert mechanism used by the 
>>>>>> import procedures. When I said that logging would be turned off,

>>>>>> I only meant that it would be turned off in the way that it is 
>>>>>> for the import procedures. This is my understanding of how that 
>>>>>> existing logic works:
>>>>>>
>>>>>> o The import procedure cooks up an INSERT statement which 
>>>>>> specifies"insertMode-bulkInsert". At execution time, if that 
>>>>>> INSERT statement specifies "insertMode=bulkInsert" AND the table

>>>>>> is empty, then a new conglomerate is created on the side and the

>>>>>> inserts go into that new conglomerate. As you note, the 
>>>>>> conglomerate creation is logged and the old conglomerate is 
>>>>>> replaced with the new conglomerate only if the insert succeeds.
>>>>>
>>>>> Ok, I don't have a problem if the same mechanism with same existing
>>>>> behavior is being used, the existing behaviors are safe and should
>>>>> not lead to corruptions or any need by customer to recover 
>>>>> themselves.
>>>> Great!
>>>>>
>>>>> If the feature is provided we should note the extra overhead that 
>>>>> this
>>>>> may cause an insert, just so someone doesn't put this on all of their
>>>>> inserts.  There is overhead for the system to check if the table is
>>>>> empty before doing the insert.  For import it seems obvious that 
>>>>> user has gone to trouble to use a different command so likely a 
>>>>> empty table
>>>>> check is not much.  But on a insert statement it is not as obvious.
>>>>> The worst case for the check is a table that had a large amount of 
>>>>> data
>>>>> that has all been deleted.  The empty check might require reading 
>>>>> a large number of empty pages - depending on what kind of space 
>>>>> reclamation has gone on.  I think that is also a benefit of the 
>>>>> replace
>>>>> option, we don't need to do the check.
>>>> Thanks. I'll add this to the documentation for this capability.
>>>>>
>>>>> I am not sure the internal syntax is the best way to go.  I 
>>>>> believe it was originally eliminated because it was non standard.  
>>>>> Maybe some new
>>>>> syntax would be more appropriate.
>>>> Thanks for that piece of history. I wondered why the feature was 
>>>> disabled. Right now the syntax is the properties extension which we 
>>>> also use for optimizer overrides. I'm not aware of any standard 
>>>> language for this. We could introduce two non-reserved keywords for 
>>>> this: BULK and REPLACE. How does the following sound:
>>>>
>>>> insert [ bulk | replace ] into  ...
>>> I prefer the suggested syntax over the properties syntax, but don't 
>>> like
>>> adding nonstandard syntax to derby.   Especially to basic sql like 
>>> insert.  The replace option is a huge diversion from the behavior
>>> standard also.
>> The standard does allow vendors to extend SQL provided that the 
>> extensions do not conflict with language approved by the SQL 
>> committee. There are many examples of Derby extensions, including 
>> many which were present when Derby was open-sourced originally. I 
>> agree that we should be cautious about introducing extensions. I also 
>> agree that this proposed syntax troubles me less than the optimizer 
>> overrides do.
>>>
>>> When we were packaging up Derby as a standards based db for apache
>>> we pulled out whatever nonstandard syntax/behavior we could.  When 
>>> there
>>> was a feature we wanted to provide where we could find no standard, 
>>> system procedures were used.
>>> Putting the non-standard behavior into procedures makes it very 
>>> obvious to users that the functionality is non-standard.  If someone 
>>> wants to port their application from derby to some other db, it is a 
>>> simple rule of thumb to not include system procedure calls.  As you 
>>> pointed out this was not done for optimizer hints
>>> which didn't lend itself well to a procedure based approach.  There we
>>> hid the syntax in comments so we weren't really touching the "real"
>>> standard syntax.  It is kind of a hack, so really would not use it as
>>> a model for adding more features to existing standard syntax.
>>>
>>> Going back to why you said users wanted this feature.  Would users get
>>> the functionality they are looking for if we added import procedures 
>>> which took an SQL table as input rather than a file.  Where
>>> the SQL table could also be a table function now that they are 
>>> supported.  I think we would have added this import procedure 
>>> originally if table functions had existed at the time.
>> Table functions can take arguments, including ? parameters and 
>> arbitrarily complicated expressions. I don't think these would be 
>> easy to model with system procedures.
> Without ? parameters it seems straight forward.  You create the table
> function outside of the procedure and give it a name.  Then you pass
> that name into the import procedure.
>
> I was hoping that we could just pass in either the table type or the
> a result set into the sytem procedure and then it would solve all the
> issues.  But I see you can only pass data types into system procedures.
>
> Are there any new standards that would allow ResultSets to be passed
> to table functions?  
A table function returns a row multiset; a procedure could take a row 
multiset as the type of one its arguments. We could build support for 
declaring row multisets. However, there is no generic supertype of all 
row multisets. That means that we would have to define an infinite 
number of system procedures, one for each possible row multiset 
signature which a table function could return.
> Do user defined types help at all?
I don't see a solution involving UDTs. What do you have in mind?

Thanks,
-Rick
>
>
>>
>> Thanks,
>> -Rick
>>>>
>>>> Thanks,
>>>> -Rick
>>>>
>>>>
>>>
>>>
>>
>>
>
>


Mime
View raw message