db-torque-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Fischer <tfisc...@apache.org>
Subject Re: Torque features I've added in a forked tree... Merge into main?
Date Mon, 29 Aug 2005 05:31:13 GMT


On Mon, 22 Aug 2005, Kevin Burton wrote:

>>> assume other databases) this has a significant performance advantage.
>>> Every SQL call costs about 1ms so for a list of 1000 items this can
>>> save 999ms.
>>
>> I'd like to know more about this. What exactly does it do ? How is it
>> implemented ?
>
> If you have a list of 1000x items, and you call getFoo() on each of
> them, and FOO_ID is a FK then Torque will do a SELECT * FROM FOO WHERE
> ID = N and for each item.
>
> We simply do a SELECT * FROM FOO WHERE ID IN (...) and then load this
> into a HashMap and then do a join on the client.
>
> Very fast...

This is definitely very useful and should be included into Torque in  my 
opinion.

I admit I was confused at first at the name you gave it. Is it already 
used elswhere ? (I could not get to the meaning of "pre" in this context, 
but it means "before accessing the getters"  in the context, in the 
line of "prefetch"), but still to me a more speaking name of the method 
would have been "readJoined". But this may be a matter of taste.

I wondered what the API was, but now I have just seen the example that you 
give at the bottom of your mail:

>        FooPeer.prejoin( FooPeer.RESOURCE_ID, BarPeer.VALUE )
>                .p()
>            .prejoin( FooPeer.FEED_ID )
>                .prejoin( CatPeer.RESOURCE_ID, BarPeer.VALUE )

and I do not understand it. I thought you would have to pass a list of Foo 
objects in somewhere ? So I would have guessed that instead of having
prejoin(String, String), you would need to have prejoin(List, String, 
String) somewhere, where the List contains a List of Foo objects ?
Also, what kind of Objects do you return with the resulting call so that 
you can do prejoin() on it again ? And what does the p() method and the 
prejoin(String) method do ?

>
>>>  support for prepared statements (which are MUCH faster in MySQL)
>>
>> Torque is using prepared statements for updates and deletes, and there is
>> some support for prepared statements on selects also. However, the support
>> of prepared statements for selects is quite limited at the moment, so if
>> you can do better, why not ? But merging this will be a real mess.
>
> Yeah.. true.  And we developed a new query API too which I'm not too
> sure about.
>
> The API just generates SQL and supports prepared statemetns.  The
> Criteria API couldn't do what I wanted and I needed something I could
> extend easily.

If you ask for my opinion, Torque will stick with Criteria at the moment, 
just because people are used to it. But it would interest me what problems 
you had with Criteria and how you tackled them. Perhaps some of this can 
be used in Criteria ?

>
>>>  when performing an UPDATE/INSERT only send columns that have been
>>> modified
>>
>> ...
>>
>> Maybe if some people are interested, one could make it
>> configurable via a generator option.
>
> Yes.  That's what we did actually.  It will depend on the implementor I think.
>
>> This would also solve the problem of
>> backward compatibility.
>
> Yeah.. totally.  This is very important.
>

Ok, if its configurable, my personal opinion is we should take it in. 
Maybe some other developers could let us know what they think ?

>>>  client-side join for IN clauses by specifying a List of BaseObjects
>>> which have their getId() method called.
>>
>> This sounds like a rather specialized thing to me. It sounds as if it
>> would only work on primary keys. Is that correct ?
>
> I have an API that I can use to as an anonymous fuction basically
> called a KeyMapper you can use for non--PKs.  The default just uses
> the PK.
>
> It's very handy actually.  I've used it for significant performance updates.
>

Hm, still not convinced about that. How standalone is it ? Meaning, do you 
have to change much in Torque, or is it rather an add-on ? I feel I would 
object if it means many changes in main Torque, but why not give the user 
the possibility to use it or not if its quite standalone ?

>>>  lazy result set processing.  For example if you have a table with
>>> 10000 columns and you only call getColumnA() then we don't have to
>>> fetch the remaining columns.  This has a HUGE performance advantage at
>>> runtime.
>>
>> I'd like to know more about this. I would guess that the object would have
>> to keep the database connection open either until all of its getters are
>> called or until its destructor is called. Is that correct ?
>
> Ah!  yeah.. ha.. So it turns out that MySQL only returns the result as
> a bytearray so you have all the data you need.  you can parse out the
> results ANYTIME you want.  The trick was that the JDBC spec doesn't
> allow this because once close() is called you're goin to get an
> Exception.
>
> I wrote a patch to the MySQL JDBC driver which is standard so that you
> can use holdOpenResults=true and then use lazy parsing of the results.
>
> That said though this may apply to other DBs as well.
>
> Regardless it's a BIG performance update.  8-10x from my benchmarks.
>

I am sorry to say that, but in my opinion it sounds like a big hack, which 
will not be transferable to other databases. It may be worth while if you 
want extreme performance, but I think Torque should not go into the 
business of patching db drivers.

>>>  New database connection pool called BDCP which is simple to
>>> debug/understand, FAST, and reliable.  We had countless problems with
>>> DBCP.
>>
>> I am against having a db connection pool inside Torque. Reason is that a
>> db connection pool is a tool which is too general to be embedded in a
>> project like Torque. It should be a standalone project.
>
> Sure.. maybe. I could put BDCP somewhere else I guess.  I'd like to
> include the adapter though.

If you find a place to host it so that it is publicly accessible, we'll do 
that. Maybe you could even ask the dbcp people if they are interested ?

>>>  save() should update the identity of the object with ID that was
>>> generated on the server.
>>
>> Not sure if I grasp what you mean by this. You mean if an id is
>> autoincrement, and you save a new object, then the id should be updated in
>> memory ? This is already the case. But maybe I misunderstood you. Please
>> correct me if I'm wrong.
>
> Yeah.. that way you can update FK references for that objects ID.  Is
> that already in the current code?
>

Yes, the id of the objects change in memory when they have been assigned 
on a doInsert. Also the ForeignKeys in the collections stored by this 
object are assigned.

>>>  BETA support for memcached. This means you can call retrieveById and
>>> the prejoin API and have the results come from memcache and NOT from a
>>> SQL box.  This should theoretically be much faster but I'm still working
>>> on it.  The results were actually 4x slower but I'm optimistic that I
>>> can have this fixed.
>>
>> There is an experimental interface for using JCS as memory cache in Torque
>> right now. You have to set the generator option useManagers=true for it. I
>> have no big experience in this part of Torque, maybe someone else who is
>> using the managers should comment on it.
>
> OK. Maybe I can look at hte implementation when I need memcached
> again.  I don't need it right now.
>
>>>  REPLACE support for MySQL
>>
>> As a non-mysql user, what does REPLACE do ? Is this standard SQL ? Does it
>> also work on other databases ?
>
> Hm.. I don't know if it's standard but honestly I need an API to use
> non-standard SQL. MySQL has a bunch of extensions that are just
> required for building a decent DB app.
>
> That's another reason I did the custom query API I was talking about
> because with the Criteria API I couldn't do anything nonstandard.
>

On the query side, you can always use criteria.custom for nonstandard 
queries. On the update side, never thought about it. The only problem that 
I see at the moment is that it is problematic with prepared statements 
because there is no setFunction() or the like in prepared statements (at 
least I did not find it).
It would be nice to have a hook on the update side to hook in custom SQL. 
Maybe something exists in the depths of Torque, I never digged for it.
In my opinion, this would be much preferable over suppurting custom stuff 
for mysql.

>>>  Support for working in MASTER/SLAVE environments when working
>>> with MySQL.  save() and doUpdate, doInsert, etc work on a dedicated
>>> connection to the master while SELECTs perform on another dedicated
>>> connection. The SLAVE connection could also be used with a load balancer
>>> if necessary.
>>
>> Again, is this mysql-specific ?
>
> It would be for any DB that's MASTER/SLAVE oriented.  I just altered
> the generator to have another DB pool and then I used that on save().
> It's not too much work to add this and keep it non-mysql specific.
>

If it is made configurable with default=off (so the standard user 
does not see it), I think it should be added.

>>>  doSelectFirstRecord which returns the object type instead of just
>>> Object when you know there will only ever be one result.  This is just
>>> syntactic sugar and prevents casting.
>>
>> Personally, I am undecided on that. I do not mind casting too much, and it
>> adds still another method to the Peer class, making them even more
>> stuffed. On the other hand, it might come in handy for some users.
>> Then again, it is not difficult to implement this in the templates, so if
>> someone has a real need for it, no problem to implement it. Anyone else
>> cares to comment ?
>
> Hm.  Wonder if there' s a way to extend the generators to support
> plugins.  I was thinking about this all along as altering the
> generator to add app-specific code in the Base* peers in an easy
> manner would be pretty sweet.
>

The same thought lingered in the back of my mind. It should not be 
problematic for appending stuff, only a convention is needed how add-on 
templates should be named and where to put them. This would be a great 
feature.

>>>  Support for a HashList mechanism where I can take a left join and make
>>> it into a Map where the keys are the left most column and the values are
>>> a list of all the items in that key.  This really helps for tree
>>> structures since SQL can't return hierarchical structures. This
>>> mechansim only supports on level of nesting right now.
>>
>> I do not understand this right now. Can you give an example ?
>
> Sure.. Let's say you want to return a folder structure in a result set:
>
>
> foldera  |  foo
> foldera  |  bar
> folderb  |  cat
> folderb  | dog
>
> Then you'd get a HashList like:
>
> foldera: (foo, bar)
> folderb: (cat,dog)
>
> Then you get an API on top of a list.
>
>>>  doSelectAsHashMap which can take a column,
>>> and a query, and return the result as a HashMap instead of just a
>>> regular List.
>>>
>>
>> Same comment as on doSelectFirstRecord. Might come in handy some time, but
>> adds yet another method. What happens if the column is not the promary
>> key, so that you might have multiple objects for one key ?
>
> You can specific a column and I've done that.  I acutally needed this
> code for the prejoin support which provided a great deal of extra
> flexibility.
>

Why do you use a HashMap to store this ? Why don't you use a list of 
Folder objects and store the retrieved objects in the appropriate collections ?



Some organizing stuff at the end: 
As you might have noticed from the news and status list, we are currently 
close to a 3.2 release of Torque. Your Stuff will definitely not be in 
3.2. Also, we have moved from CVS to SVN a few weeks ago and the structure 
of the project was changed during that project, too. Make sure that you 
use the new structure, but do not rely on anything you read about the 
project structure in the source repository except what you find in
http://db.apache.org/torque/subversion.html
We are in the process of updating the web site, but this may take some 
days before it is completed.

Then, as your contributions are very significant contributions, we would 
need to have a CLA from you and, if this is work done for your company, 
from your company as well. See

http://www.apache.org/licenses/#clas

for information.

    Thomas

Mime
View raw message