db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <de...@segel.com>
Subject RE: SELECT query takes 5 secs, what can I do?
Date Wed, 16 Sep 2009 12:06:58 GMT


Whoa! It sounds like you're making a mountain out of a mole hill.


Ok, after a couple of rounds of messages, it becomes a little clearer on
what you're attempting to do.


If I understand the problem.


You have a mail server that you want to archive the mail messages in to a
second database.

Based on your description, the archive process is an automated process that
runs periodically on the system.

Your constraint is that you will want to save only one copy of the message
in the archive. Note you may want to store a copy of the messageId by user
(denormalized) for faster searches of the archive.


Is this not the case?

What am I missing?


Assuming that this is, the solution is pretty trivial..


First on your archive database, you put a unique index on messageId.  This
forces the constraint that you can only store one copy of the message in the
archive database.


Then in a simple java program you do the following:


Create two jdbc connections, one to each database.

Connection A is to your mail server

Connection B is to your archive database.


In connection A, you want to run a simple select statement to fetch all
records where the archive flag is not set to 1.

You want to do this in an UPDATE CURSOR. (Assuming Derby supports this


In connection B, you want to prepare your INSERT Statements. I'm assuming
that you're saving the headers in one table and then the body in a separate
table/blob space?


Open the cursor in A to fetch the records.

As you loop through the records, in a try/catch block you try to insert the
record via the cursors for B.

If you succeed, you update the record from A (CURRENT of A's cursor) and set
the archive flag to true (1).

If you failed because the messageId is already in the database, you update
the record's archive flag of the current record to true (1).

( You can make this more efficient if you order select on messageId and add
some logic to your processing.)


In a nutshell, that's pretty much it.


You mentioned that you had to do your other query because of hibernate?

If hibernate is making this task more difficult, why then are you using it?










From: Brett Wooldridge [mailto:brett.wooldridge@gmail.com] 
Sent: Tuesday, September 15, 2009 8:43 PM
To: Derby Discussion
Subject: Re: SELECT query takes 5 secs, what can I do?


The original query:

select * from Message m where
   and m.id <http://m.id/>  != 933927 and m.archived=1

and the sub-select query:

SELECT * from Message where ID IN (SELECT ID FROM Message  where
    messageId =
    and archived=1 and id!=987452

are equivalent with respect to the database.  My comment regarding the 'id
!= 987452' clause was only with respect to an index.  It's inclusion in the
query (original) does not impact performance in a negative way.

The optimizer is going to choose to use the index (on messageId), and will
perform that select first, then it will (likely) join that sub-result
against the other two clauses (archived=1 and id != 987452) -- making it
equivalent to your second query.

Do you know how to get a query plan?

The portion of the query with the messageId should have high-specificity due
to the relative uniqueness of the message within the context of all
messages.  For example, you say the table has 1 million rows, and a given
messsage in an organization of 200 users would consume 200 rows, so as you
can see the specificity is very high.  I would not expect that kind of
select to be taking 5 seconds, but rather milliseconds.  It is likely that
your index statistics are seriously out of date (see
http://issues.apache.org/jira/browse/DERBY-269).  One user reported that
updating statistics took a 22 minute query down to less than one second!

You can force Derby to update statistics with this command:

alter table <table-name> compress [sequential] 

Note this command itself might take a long time -- dozens of minutes -- but
in a system like yours you could get away with running it once or month or
so at some off-peak time.
If you have a test database in which your query is taking 5 seconds, attempt
to run the above alter table, and then re-run your query.  If your query
times are still low, post the query plan for the query here and we can take
a look at it.


View raw message