Return-Path: Delivered-To: apmail-forrest-dev-archive@www.apache.org Received: (qmail 66982 invoked from network); 18 Aug 2006 01:17:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Aug 2006 01:17:43 -0000 Received: (qmail 22961 invoked by uid 500); 18 Aug 2006 01:17:42 -0000 Delivered-To: apmail-forrest-dev-archive@forrest.apache.org Received: (qmail 22909 invoked by uid 500); 18 Aug 2006 01:17:42 -0000 Mailing-List: contact dev-help@forrest.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@forrest.apache.org List-Id: Delivered-To: mailing list dev@forrest.apache.org Received: (qmail 22898 invoked by uid 99); 18 Aug 2006 01:17:42 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Aug 2006 18:17:42 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [212.23.3.142] (HELO rutherford.zen.co.uk) (212.23.3.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Aug 2006 18:17:41 -0700 Received: from [82.69.78.226] (helo=[192.168.0.9]) by rutherford.zen.co.uk with esmtp (Exim 4.50) id 1GDszA-0001Jc-C8 for dev@forrest.apache.org; Fri, 18 Aug 2006 01:17:20 +0000 Message-ID: <44E51513.3060505@apache.org> Date: Fri, 18 Aug 2006 02:17:07 +0100 From: Ross Gardler User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: dev@forrest.apache.org Subject: Re: [RT] A new Forrest implementation? References: <44E0D61D.40306@apache.org> In-Reply-To: <44E0D61D.40306@apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Originating-Rutherford-IP: [82.69.78.226] X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Ross Gardler wrote: > This is a Random Thought. The ideas contained within are not fully > developed and are bound to have lots of holes. The idea is to promote > healthy discussion, so please, everyone, dive in and discuss. In order to better support my position in this RT I've been experimenting with alternative implementations. I now have a working (although very hacky) version of a new Forrest Core. It is *very* basic right now so don't get too excited, I'm only trying to feed the flames of this discussion. The deployed webapp version is 960kb, this includes the test code and sample documents. Add the size of Xalan and Xerces for the CLI version. Clearly this will grow as we add some of the missing features (see below). Spring, ehcache and an RE processor are probably the largest additional dependencies we need in core and they weighs in at a few hundred Kb each (I think). In other words, it looks like I can deliver in just a few megabytes. What is does have: ------------------ - XHTML2 as internal format - Locationmap support - plugin architecture - XSLT transformations - CLI interface (very basic no link following) - Webapp interface - File and HTTP readers What it doesn't have: --------------------- - Container managed components - pattern matching in the Locationmap or in the output plugin selection - handling of aggregated documents - they work on the input side, but I'm still considering how best to handle them on the output side. - external config files (i.e. the locationmap and available plugins are currently hard coded data structures) - image (and other binary files) handling - cacheing - optimisation (i.e. no SAX stream between the individual components) - adequate demos (a couple of Hello World input plugins and Gavs XHTML2 sample document only) - loads of stuff I haven't thought of yet Design ------ It's really simple (honest), the processing goes like this: request URI (to controller) -> source documents(s) (from readers) -> internal document(s) (from input plugins) -> output document (from output plugins) The main componets are: Controller ---------- This is the interface point between the application (CLI, webapp, or JUnit tests so far). To use it you do somethin like: requestURI = new URI(TestController.TEST_REQUEST_URI); Controller controller = new Controller(); AbstractOutputDocument doc = controller.getOutputDocument(requestURI); out.println(doc.getContentAsString()); LocationMap ----------- A simple lookup table mapping the requestURI to the required source document(s) - it supports optional files and aggregation. A locationmap is built as follows (remember this should be read from a config file): URI requestURI = new URI(TestController.TEST_REQUEST_URI); location = new Location(requestURI, this.getClass().getResource( TestController.SOURCE_DOCUMENT_XHTML2_COMPLETE), true); locationMap.put(requestURI, location); location = new Location(requestURI, this.getClass().getResource( TestController.SOURCE_DOCUMENT_XHTML2_SIMPLE), true); locationMap.put(requestURI, location); Then you get the locations(s) with: List locations = locationMap.get(requestURI); ReaderFactory ------------- Given the URL of a source document this factory returns the correct reader for the document. For example "http://foo.com" will return an HTTPReader whilst "file://bar" will return a file reader. Reader ------- Reads a source document and infers the type of document it is (XML, image etc. although only XML is supported right now). This returns a typed document class representing the document by using a DocumentFactory. DocumentFactory --------------- This is perhaps the most complicated part of the system. It is roughly equivalent to the source resolver, that is, it infers the type of document we are working with. Once it knows the type of document it can provide a typed document object. If we have a mime-type that gives us enough information it will use that (i.e. an OOo document). If not it will try looking ahead into the contents of the file until it has enough info. For example: while ((numRead = reader.read(buf)) != -1 && mimeType == null) { String readData = String.valueOf(buf, 0, numRead); fileData.append(readData); buf = new char[1024]; if (fileData.toString().contains("