db-torque-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <burtona...@gmail.com>
Subject Re: Torque features I've added in a forked tree... Merge into main?
Date Tue, 23 Aug 2005 00:42:02 GMT
> In principle, I think this is a very good idea. However, a warning at the
> beginning: As two years is quite a long time, you might have noticed that
> the structure of Torque was changed quite a bit in the meantime. I guess
> you branched before the generator was decoupled. Merging will not be easy.

Alright.. well all I could do is dive in and take a look.  Point taken
though.  I mean anytime you do a fork and stay out of sync for 2+
years you will have problems merging again.

> > assume other databases) this has a significant performance advantage.
> > Every SQL call costs about 1ms so for a list of 1000 items this can
> > save 999ms.
> 
> I'd like to know more about this. What exactly does it do ? How is it
> implemented ?

If you have a list of 1000x items, and you call getFoo() on each of
them, and FOO_ID is a FK then Torque will do a SELECT * FROM FOO WHERE
ID = N and for each item.

We simply do a SELECT * FROM FOO WHERE ID IN (...) and then load this
into a HashMap and then do a join on the client.

Very fast...

> > • support for prepared statements (which are MUCH faster in MySQL)
> 
> Torque is using prepared statements for updates and deletes, and there is
> some support for prepared statements on selects also. However, the support
> of prepared statements for selects is quite limited at the moment, so if
> you can do better, why not ? But merging this will be a real mess.

Yeah.. true.  And we developed a new query API too which I'm not too
sure about.

The API just generates SQL and supports prepared statemetns.  The
Criteria API couldn't do what I wanted and I needed something I could
extend easily.

> > • when performing an UPDATE/INSERT only send columns that have been
> > modified
> 
> Hm. Not sure of that. Your data could get a mess if someone else had
> modified the same object in the meantime, and I do not see that the
> performance gain is large enough to justify that. It could even decrease
> performance, 

Not in most DBs I would think.  The major burden is IO and writing
unnecessary bytes to disk and sending them over the network can be a
real pain.

> if you tell dbcp to cache the prepared statements, and you
> have to use another prepared statement each time because you have modified
> other columns.

Yes.  In our situation we were using MySQL 4.1 which only uses prep
stmts for a network IO advantage.  So this would be negated.  We also
used prepared statement caching in our JDBC driver so again this
didn't matter.

That said we've also written our own DB pool which is much faster and
much more reliable than DBCP called BDCP which I couldn't live
without.  It's essentially 5 lines of code (with wrapper) and is sort
of a fork off DBCP.  Luckly I think I could commit this and you could
use either API from the way Torque is designed.

> Maybe if some people are interested, one could make it
> configurable via a generator option.

Yes.  That's what we did actually.  It will depend on the implementor I think.

> This would also solve the problem of
> backward compatibility.

Yeah.. totally.  This is very important.

> > • client-side join for IN clauses by specifying a List of BaseObjects
> > which have their getId() method called.
> 
> This sounds like a rather specialized thing to me. It sounds as if it
> would only work on primary keys. Is that correct ?

I have an API that I can use to as an anonymous fuction basically
called a KeyMapper you can use for non--PKs.  The default just uses
the PK.

It's very handy actually.  I've used it for significant performance updates.

> > • lazy result set processing.  For example if you have a table with
> > 10000 columns and you only call getColumnA() then we don't have to
> > fetch the remaining columns.  This has a HUGE performance advantage at
> > runtime.
> 
> I'd like to know more about this. I would guess that the object would have
> to keep the database connection open either until all of its getters are
> called or until its destructor is called. Is that correct ?

Ah!  yeah.. ha.. So it turns out that MySQL only returns the result as
a bytearray so you have all the data you need.  you can parse out the
results ANYTIME you want.  The trick was that the JDBC spec doesn't
allow this because once close() is called you're goin to get an
Exception.

I wrote a patch to the MySQL JDBC driver which is standard so that you
can use holdOpenResults=true and then use lazy parsing of the results.
 
That said though this may apply to other DBs as well. 

Regardless it's a BIG performance update.  8-10x from my benchmarks.

> > • New database connection pool called BDCP which is simple to
> > debug/understand, FAST, and reliable.  We had countless problems with
> > DBCP.
> 
> I am against having a db connection pool inside Torque. Reason is that a
> db connection pool is a tool which is too general to be embedded in a
> project like Torque. It should be a standalone project.

Sure.. maybe. I could put BDCP somewhere else I guess.  I'd like to
include the adapter though.

> Don't get me wrong. I have no problem at all to include an adapter for
> another pool besides dbcp into Torque.

Yup.
 
> Which dbcp version are you talking about ? From what I have heard
> so far, people are quite content with dbcp 1.2.1. Only issue I know of
> which is open is that there are problems with a serializable checked
> non-auto-commit connections.

Well even if they did improve all the threading bugs we encountered I
think there would still be a speed issue.  Even if they fixed this I
think there would be a code bloat issue as BDCP is only about 20 lines
of code for the core.  I just became really pissed one day that it
would take HOURS for me to grok DBCP so I scrapped it and started from
scratch.

> > • save() should update the identity of the object with ID that was
> > generated on the server.
> 
> Not sure if I grasp what you mean by this. You mean if an id is
> autoincrement, and you save a new object, then the id should be updated in
> memory ? This is already the case. But maybe I misunderstood you. Please
> correct me if I'm wrong.

Yeah.. that way you can update FK references for that objects ID.  Is
that already in the current code?

> > • BETA support for memcached. This means you can call retrieveById and
> > the prejoin API and have the results come from memcache and NOT from a
> > SQL box.  This should theoretically be much faster but I'm still working
> > on it.  The results were actually 4x slower but I'm optimistic that I
> > can have this fixed.
> 
> There is an experimental interface for using JCS as memory cache in Torque
> right now. You have to set the generator option useManagers=true for it. I
> have no big experience in this part of Torque, maybe someone else who is
> using the managers should comment on it.

OK. Maybe I can look at hte implementation when I need memcached
again.  I don't need it right now.
 
> > • REPLACE support for MySQL
> 
> As a non-mysql user, what does REPLACE do ? Is this standard SQL ? Does it
> also work on other databases ?

Hm.. I don't know if it's standard but honestly I need an API to use
non-standard SQL. MySQL has a bunch of extensions that are just
required for building a decent DB app.

That's another reason I did the custom query API I was talking about
because with the Criteria API I couldn't do anything nonstandard.

> > • Support for working in MASTER/SLAVE environments when working
> > with MySQL.  save() and doUpdate, doInsert, etc work on a dedicated
> > connection to the master while SELECTs perform on another dedicated
> > connection. The SLAVE connection could also be used with a load balancer
> > if necessary.
> 
> Again, is this mysql-specific ?

It would be for any DB that's MASTER/SLAVE oriented.  I just altered
the generator to have another DB pool and then I used that on save(). 
It's not too much work to add this and keep it non-mysql specific.
 
> > • doSelectFirstRecord which returns the object type instead of just
> > Object when you know there will only ever be one result.  This is just
> > syntactic sugar and prevents casting.
> 
> Personally, I am undecided on that. I do not mind casting too much, and it
> adds still another method to the Peer class, making them even more
> stuffed. On the other hand, it might come in handy for some users.
> Then again, it is not difficult to implement this in the templates, so if
> someone has a real need for it, no problem to implement it. Anyone else
> cares to comment ?

Hm.  Wonder if there' s a way to extend the generators to support
plugins.  I was thinking about this all along as altering the
generator to add app-specific code in the Base* peers in an easy
manner would be pretty sweet.

> > • Support for a HashList mechanism where I can take a left join and make
> > it into a Map where the keys are the left most column and the values are
> > a list of all the items in that key.  This really helps for tree
> > structures since SQL can't return hierarchical structures. This
> > mechansim only supports on level of nesting right now.
> 
> I do not understand this right now. Can you give an example ?

Sure.. Let's say you want to return a folder structure in a result set:


foldera  |  foo
foldera  |  bar
folderb  |  cat
folderb  | dog

Then you'd get a HashList like:

foldera: (foo, bar)
folderb: (cat,dog)

Then you get an API on top of a list.

> > • doSelectAsHashMap which can take a column,
> > and a query, and return the result as a HashMap instead of just a
> > regular List.
> >
> 
> Same comment as on doSelectFirstRecord. Might come in handy some time, but
> adds yet another method. What happens if the column is not the promary
> key, so that you might have multiple objects for one key ?

You can specific a column and I've done that.  I acutally needed this
code for the prejoin support which provided a great deal of extra
flexibility.

> As a final remark, some of your features sound really interesting.
> Especially the ability to fetch a whole tree of objects at one time is a
> feature which is certainly often needed but only provided in a limited
> extent by Torque (There are the doSelectJoinXXX methods, but only for one
> level of nesting).

I also designed the prejoin code so that it forms a visual graph in
the source.  For example:

        FooPeer.prejoin( FooPeer.RESOURCE_ID, BarPeer.VALUE )
                .p()
            .prejoin( FooPeer.FEED_ID )
                .prejoin( CatPeer.RESOURCE_ID, BarPeer.VALUE )

Which I think make's it easy to visualize your tree structure that
you're trying to join against.

-- 
 Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://www.feedblog.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
Mime
View raw message