The original query:
select * from Message m where
and m.id != 933927 and m.archived=1
and the sub-select query:
SELECT * from Message where ID IN (SELECT ID FROM Message where
messageId = '<7997716ED1AF3D47A35D74FA2CB61092E2A619@somedomaion.local>')
and archived=1 and id!=987452
are equivalent with respect to the database. My comment regarding the 'id != 987452' clause was only with respect to an index. It's inclusion in the query (original) does not impact performance in a negative way.
The optimizer is going to choose to use the index (on messageId), and will perform that select first, then it will (likely) join that sub-result against the other two clauses (archived=1 and id != 987452) -- making it equivalent to your second query.
Do you know how to get a query plan? http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
The portion of the query with the messageId should have high-specificity due to the relative uniqueness of the message within the context of all messages. For example, you say the table has 1 million rows, and a given messsage in an organization of 200 users would consume 200 rows, so as you can see the specificity is very high. I would not expect that kind of select to be taking 5 seconds, but rather milliseconds. It is likely that your index statistics are seriously out of date (see http://issues.apache.org/jira/browse/DERBY-269). One user reported that updating statistics took a 22 minute query down to less than one second!
You can force Derby to update statistics with this command:
alter table <table-name> compress [sequential]
Note this command itself might take a long time -- dozens of minutes -- but in a system like yours you could get away with running it once or month or so at some off-peak time.
If you have a test database in which your query is taking 5 seconds, attempt to run the above alter table, and then re-run your query. If your query times are still low, post the query plan for the query here and we can take a look at it.