Andrew,

 

Whoa! It sounds like you’re making a mountain out of a mole hill.

 

Ok, after a couple of rounds of messages, it becomes a little clearer on what you’re attempting to do…

 

If I understand the problem…

 

You have a mail server that you want to archive the mail messages in to a second database.

Based on your description, the archive process is an automated process that runs periodically on the system.

Your constraint is that you will want to save only one copy of the message in the archive. Note you may want to store a copy of the messageId by user  (denormalized) for faster searches of the archive.

 

Is this not the case?

What am I missing?

 

Assuming that this is, the solution is pretty trivial….

 

First on your archive database, you put a unique index on messageId.  This forces the constraint that you can only store one copy of the message in the archive database.

 

Then in a simple java program you do the following:

 

Create two jdbc connections, one to each database.

Connection A is to your mail server

Connection B is to your archive database.

 

In connection A, you want to run a simple select statement to fetch all records where the archive flag is not set to 1.

You want to do this in an UPDATE CURSOR. (Assuming Derby supports this terminology/syntax).

 

In connection B, you want to prepare your INSERT Statements. I’m assuming that you’re saving the headers in one table and then the body in a separate table/blob space?

 

Open the cursor in A to fetch the records.

As you loop through the records, in a try/catch block you try to insert the record via the cursors for B.

If you succeed, you update the record from A (CURRENT of A’s cursor) and set the archive flag to true (1).

If you failed because the messageId is already in the database, you update the record’s archive flag of the current record to true (1).

( You can make this more efficient if you order select on messageId and add some logic to your processing.)

 

In a nutshell, that’s pretty much it.

 

You mentioned that you had to do your other query because of hibernate?

If hibernate is making this task more difficult, why then are you using it?

 

 

HTH

 

-Mike

 

 

 


From: Brett Wooldridge [mailto:brett.wooldridge@gmail.com]
Sent: Tuesday, September 15, 2009 8:43 PM
To: Derby Discussion
Subject: Re: SELECT query takes 5 secs, what can I do?

 

The original query:

select * from Message m where
   m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@somedomain.com>'
   and m.id != 933927 and m.archived=1

and the sub-select query:

SELECT * from Message where ID IN (SELECT ID FROM Message  where
    messageId = '<7997716ED1AF3D47A35D74FA2CB61092E2A619@somedomaion.local>')
    and archived=1 and id!=987452

are equivalent with respect to the database.  My comment regarding the 'id != 987452' clause was only with respect to an index.  It's inclusion in the query (original) does not impact performance in a negative way.

The optimizer is going to choose to use the index (on messageId), and will perform that select first, then it will (likely) join that sub-result against the other two clauses (archived=1 and id != 987452) -- making it equivalent to your second query.

Do you know how to get a query plan?  http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

The portion of the query with the messageId should have high-specificity due to the relative uniqueness of the message within the context of all messages.  For example, you say the table has 1 million rows, and a given messsage in an organization of 200 users would consume 200 rows, so as you can see the specificity is very high.  I would not expect that kind of select to be taking 5 seconds, but rather milliseconds.  It is likely that your index statistics are seriously out of date (see http://issues.apache.org/jira/browse/DERBY-269).  One user reported that updating statistics took a 22 minute query down to less than one second!

You can force Derby to update statistics with this command:

alter table <table-name> compress [sequential]

Note this command itself might take a long time -- dozens of minutes -- but in a system like yours you could get away with running it once or month or so at some off-peak time.
 
If you have a test database in which your query is taking 5 seconds, attempt to run the above alter table, and then re-run your query.  If your query times are still low, post the query plan for the query here and we can take a look at it.

Brett