Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 61934 invoked from network); 15 Apr 2008 20:04:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Apr 2008 20:04:35 -0000 Received: (qmail 8784 invoked by uid 500); 15 Apr 2008 20:04:34 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 8756 invoked by uid 500); 15 Apr 2008 20:04:34 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 8744 invoked by uid 99); 15 Apr 2008 20:04:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 13:04:34 -0700 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.18.43.133] (HELO sca-es-mail-2.sun.com) (192.18.43.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 20:03:49 +0000 Received: from fe-sfbay-10.sun.com ([192.18.43.129]) by sca-es-mail-2.sun.com (8.13.7+Sun/8.12.9) with ESMTP id m3FK42SN023076 for ; Tue, 15 Apr 2008 13:04:02 -0700 (PDT) Received: from conversion-daemon.fe-sfbay-10.sun.com by fe-sfbay-10.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) id <0JZD00B01U2K8C00@fe-sfbay-10.sun.com> (original mail from Richard.Hillegas@Sun.COM) for derby-user@db.apache.org; Tue, 15 Apr 2008 13:04:02 -0700 (PDT) Received: from richard-hillegas-computer.local ([129.150.16.254]) by fe-sfbay-10.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) with ESMTPSA id <0JZD009K2UECWX40@fe-sfbay-10.sun.com> for derby-user@db.apache.org; Tue, 15 Apr 2008 13:03:48 -0700 (PDT) Date: Tue, 15 Apr 2008 13:03:46 -0700 From: Rick Hillegas Subject: Re: Managing many databases In-reply-to: <9A1A5FFB-3B42-4663-BE3C-DD8DF4CBA623@sixfriedrice.com> Sender: Richard.Hillegas@Sun.COM To: Derby Discussion Message-id: <48050A22.8040901@sun.com> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT References: <9A1A5FFB-3B42-4663-BE3C-DD8DF4CBA623@sixfriedrice.com> User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) X-Virus-Checked: Checked by ClamAV on apache.org Hi Geoff, You have asked a lot of interesting questions. I will try to give you some feedback on some of your questions. Hopefully others can provide more information. Please see my responses inline... Six Fried Rice wrote: > I'm a first-time poster so I hope I'm following protocol here. I > searched the MarkMail archive and I don't think this is a FAQ. > > We're considering using derby in an atypical situation, and I'm > looking for some general feedback on how best to proceed. The > application processes very large XML reports (100MB to 2GB) for our > customers, and then presents the data in an explorable fashion through > the browser. A typical report might produce around 500,000 records, > with up to maybe 2 million records or so at the (rare) top end. To > keep this under control, we are using this model: > > 1: The user interacts with our web site to set up an account and > prepare to process a report. > 2: When they opt to process a report, a Java WebStart application > launches and processes the report with an embedded derby. > 3: When the processing is complete, the derby database is jarred and > uploaded to the server. > 4: At that point, all the data is completely read-only. > > All of this is largely working (less a few bugs) and we're very happy > with the performance and the notion that the heavy lifting happens on > the client side. > > Now I'm trying to decide how we will handle the server side database > interaction if we continue with this model. In the simplest case, I'd > like to interact directly with the user's individual derby databases > (one per report). This has several advantages: > > 1: We don't have any time-consuming import process to put all that > data into a centralized database > 2: We get built-in partitioning of the data on the server side which > is good news for scalability > 3: The data model is somewhat complex and join-heavy, and I suspect > several smaller databases will, in general, perform better than one > very large database with hundreds of millions of records > 4: Cleanup is a breeze: to remove a report we just whack a directory > on the file system > > But I'm not sure how best to actually manage all these databases. I > suspect we will have on the order of 1000 databases in play, with > maybe 20 of those being actively used at a single busy time. It is > conceivable that we will have more than this, depending on the success > of the system. So I guess I'm looking for any general insights, plus > answers to a few concrete questions: > > 1: What are the performance characteristics of using zipped or jarred > DBs? It doesn't bother me to unzip them, but I saw this option in the > documentation and I was curious. Can these jars be in arbitrary > locations on the file system, and be connected to ad-hoc? Can a derby > server provide access to a jarred database at an arbitrary filesystem > location? Please take a look at the section titled "Accessing a read-only database in a zip/jar file" in the Derby Developer's Guide: http://db.apache.org/derby/docs/10.3/devguide/ The jars can live anywhere in the file system or on the classpath. > > 2: Are there any performance concerns with having many databases in a > single derby install? Would it be better to run one derby server, with > 1000 databases, or run multiple derby servers on the same hardware and > partition the databases across them? I'm not looking for exact > numbers, since they obviously depend on a lot of factors. But in > general, can I load a ton of databases into derby server and be OK? > (We have no problem throwing additional hardware at this system as > needed.) Hard to say. Most of our performance work has measured the performance of many clients hammering a single database. I don't know where Derby maxes out in its ability to saturate multiple processors when you are running an application against many databases. I think that against a single database, there is a limit (4?) to the number of processors which a Derby server can keep busy. That may or may not scale up if your server is managing more than one database. > > 3: Can derby server discover new databases if I simply copy (or > symlink?) a derby database directory to its DERBY_HOME? Or do the > databases need to be *created* programmatically through JDBC? Derby has no heuristic for knowing where to look for databases. I think that database discovery has to be done by your application. Basically, you need to locate the database via a JDBC connection URL. > > 4: Anybody have any experience with rails and derby? I see a few hints > on line that people are doing it but I'm not too certain of the > stability and details on what is supported. I'll have to write my own > connection pooling and switching code in rails, which I don't think > will be too tough. But an alternative would be to build a JEE-based > web service to manage the derby interaction, and then have my rails > application interact with that data server, if Rails/Derby is not a > reliable or well-performing option. Sorry, I'm out of my league here. Hope this is a little helpful, -Rick > > I know this is an open-ended question. I appreciate any time and > insight any of you may offer :) > > Thanks, > > Geoff