db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Managing many databases
Date Tue, 15 Apr 2008 20:03:46 GMT
Hi Geoff,

You have asked a lot of interesting questions. I will try to give you 
some feedback on some of your questions. Hopefully others can provide 
more information. Please see my responses inline...


Six Fried Rice wrote:
> I'm a first-time poster so I hope I'm following protocol here. I 
> searched the MarkMail archive and I don't think this is a FAQ.
>
> We're considering using derby in an atypical situation, and I'm 
> looking for some general feedback on how best to proceed. The 
> application processes very large XML reports (100MB to 2GB) for our 
> customers, and then presents the data in an explorable fashion through 
> the browser. A typical report might produce around 500,000 records, 
> with up to maybe 2 million records or so at the (rare) top end. To 
> keep this under control, we are using this model:
>
> 1: The user interacts with our web site to set up an account and 
> prepare to process a report.
> 2: When they opt to process a report, a Java WebStart application 
> launches and processes the report with an embedded derby.
> 3: When the processing is complete, the derby database is jarred and 
> uploaded to the server.
> 4: At that point, all the data is completely read-only.
>
> All of this is largely working (less a few bugs) and we're very happy 
> with the performance and the notion that the heavy lifting happens on 
> the client side.
>
> Now I'm trying to decide how we will handle the server side database 
> interaction if we continue with this model. In the simplest case, I'd 
> like to interact directly with the user's individual derby databases 
> (one per report). This has several advantages:
>
> 1: We don't have any time-consuming import process to put all that 
> data into a centralized database
> 2: We get built-in partitioning of the data on the server side which 
> is good news for scalability
> 3: The data  model is somewhat complex and join-heavy, and I suspect 
> several smaller databases will, in general, perform better than one 
> very large database with hundreds of millions of records
> 4: Cleanup is a breeze: to remove a report we just whack a directory 
> on the file system
>
> But I'm not sure how best to actually manage all these databases. I 
> suspect we will have on the order of 1000 databases in play, with 
> maybe 20 of those being actively used at a single busy time. It is 
> conceivable that we will have more than this, depending on the success 
> of the system. So I guess I'm looking for any general insights, plus 
> answers to a few concrete questions:
>
> 1: What are the performance characteristics of using zipped or jarred 
> DBs? It doesn't bother me to unzip them, but I saw this option in the 
> documentation and I was curious. Can these jars be in arbitrary 
> locations on the file system, and be connected to ad-hoc? Can a derby 
> server provide access to a jarred database at an arbitrary filesystem 
> location?
Please take a look at the section titled "Accessing a read-only database 
in a zip/jar file" in the Derby Developer's Guide: 
http://db.apache.org/derby/docs/10.3/devguide/ The jars can live 
anywhere in the file system or on the classpath.
>
> 2: Are there any performance concerns with having many databases in a 
> single derby install? Would it be better to run one derby server, with 
> 1000 databases, or run multiple derby servers on the same hardware and 
> partition the databases across them? I'm not looking for exact 
> numbers, since they obviously depend on a lot of factors. But in 
> general, can I load a ton of databases into derby server and be OK? 
> (We have no problem throwing additional hardware at this system as 
> needed.)
Hard to say. Most of our performance work has measured the performance 
of many clients hammering a single database. I don't know where Derby 
maxes  out in its ability to saturate multiple processors when you are 
running an application against many databases. I think that against a 
single database, there is a limit (4?) to the number of processors which 
a Derby server can keep busy. That may or may not scale up if your 
server is managing more than one database.
>
> 3: Can derby server discover new databases if I simply copy (or 
> symlink?) a derby database directory to its DERBY_HOME? Or do the 
> databases need to be *created* programmatically through JDBC?
Derby has no heuristic for knowing where to look for databases.  I think 
that database discovery has to be done by your application.  Basically, 
you need to locate the database via a JDBC connection URL.
>
> 4: Anybody have any experience with rails and derby? I see a few hints 
> on line that people are doing it but I'm not too certain of the 
> stability and details on what is supported. I'll have to write my own 
> connection pooling and switching code in rails, which I don't think 
> will be too tough. But an alternative would be to build a JEE-based 
> web service to manage the derby interaction, and then have my rails 
> application interact with that data server, if Rails/Derby is not a 
> reliable or well-performing option.
Sorry, I'm out of my league here.

Hope this is a little helpful,
-Rick
>
> I know this is an open-ended question. I appreciate any time and 
> insight any of you may offer :)
>
> Thanks,
>
> Geoff


Mime
View raw message