db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Six Fried Rice <tech...@sixfriedrice.com>
Subject Managing many databases
Date Tue, 15 Apr 2008 19:24:49 GMT
I'm a first-time poster so I hope I'm following protocol here. I  
searched the MarkMail archive and I don't think this is a FAQ.

We're considering using derby in an atypical situation, and I'm  
looking for some general feedback on how best to proceed. The  
application processes very large XML reports (100MB to 2GB) for our  
customers, and then presents the data in an explorable fashion through  
the browser. A typical report might produce around 500,000 records,  
with up to maybe 2 million records or so at the (rare) top end. To  
keep this under control, we are using this model:

1: The user interacts with our web site to set up an account and  
prepare to process a report.
2: When they opt to process a report, a Java WebStart application  
launches and processes the report with an embedded derby.
3: When the processing is complete, the derby database is jarred and  
uploaded to the server.
4: At that point, all the data is completely read-only.

All of this is largely working (less a few bugs) and we're very happy  
with the performance and the notion that the heavy lifting happens on  
the client side.

Now I'm trying to decide how we will handle the server side database  
interaction if we continue with this model. In the simplest case, I'd  
like to interact directly with the user's individual derby databases  
(one per report). This has several advantages:

1: We don't have any time-consuming import process to put all that  
data into a centralized database
2: We get built-in partitioning of the data on the server side which  
is good news for scalability
3: The data  model is somewhat complex and join-heavy, and I suspect  
several smaller databases will, in general, perform better than one  
very large database with hundreds of millions of records
4: Cleanup is a breeze: to remove a report we just whack a directory  
on the file system

But I'm not sure how best to actually manage all these databases. I  
suspect we will have on the order of 1000 databases in play, with  
maybe 20 of those being actively used at a single busy time. It is  
conceivable that we will have more than this, depending on the success  
of the system. So I guess I'm looking for any general insights, plus  
answers to a few concrete questions:

1: What are the performance characteristics of using zipped or jarred  
DBs? It doesn't bother me to unzip them, but I saw this option in the  
documentation and I was curious. Can these jars be in arbitrary  
locations on the file system, and be connected to ad-hoc? Can a derby  
server provide access to a jarred database at an arbitrary filesystem  

2: Are there any performance concerns with having many databases in a  
single derby install? Would it be better to run one derby server, with  
1000 databases, or run multiple derby servers on the same hardware and  
partition the databases across them? I'm not looking for exact  
numbers, since they obviously depend on a lot of factors. But in  
general, can I load a ton of databases into derby server and be OK?  
(We have no problem throwing additional hardware at this system as  

3: Can derby server discover new databases if I simply copy (or  
symlink?) a derby database directory to its DERBY_HOME? Or do the  
databases need to be *created* programmatically through JDBC?

4: Anybody have any experience with rails and derby? I see a few hints  
on line that people are doing it but I'm not too certain of the  
stability and details on what is supported. I'll have to write my own  
connection pooling and switching code in rails, which I don't think  
will be too tough. But an alternative would be to build a JEE-based  
web service to manage the derby interaction, and then have my rails  
application interact with that data server, if Rails/Derby is not a  
reliable or well-performing option.

I know this is an open-ended question. I appreciate any time and  
insight any of you may offer :)



View raw message