Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 89764 invoked from network); 27 Jun 2003 03:16:39 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 27 Jun 2003 03:16:39 -0000 Received: (qmail 15211 invoked by uid 97); 27 Jun 2003 03:19:09 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 15204 invoked from network); 27 Jun 2003 03:19:09 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 27 Jun 2003 03:19:09 -0000 Received: (qmail 89449 invoked by uid 500); 27 Jun 2003 03:16:36 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 89431 invoked from network); 27 Jun 2003 03:16:36 -0000 Received: from natsmtp00.webmailer.de (HELO post.webmailer.de) (192.67.198.74) by daedalus.apache.org with SMTP; 27 Jun 2003 03:16:36 -0000 Received: from dstc.edu.au (g613-8949.itee.uq.edu.au [130.102.66.107]) by post.webmailer.de (8.12.8/8.8.7) with ESMTP id h5R3GdAf023774; Fri, 27 Jun 2003 05:16:40 +0200 (MEST) Message-ID: <3EFBB716.7060502@dstc.edu.au> Date: Fri, 27 Jun 2003 13:16:38 +1000 From: Peter Becker User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: lucene-dev@jakarta.apache.org Subject: LARM: status? / File System Indexer Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi all, Andrew already forwarded one of my mails on the list, so you might know what I am looking for by now. Maybe some more words as clarifications: What we are doing is writing a personal document management tool based on Lucene and our visualization techniques. Actually I should say: what we have done, the only problem is that indexing is still a big hack. The plan we made to do it right was pretty much what Andrew described in his website and by now I have found the LARM descriptions here and there. In a way this framework is bigger than we are aiming for (we care only about scenario 1.1 - File System Indexer in term of the LARM documentation), but we would be happy to try to collaborate in the effort. Here is the scenario: we are two experienced Java developers trying to get our demonstrator up and going in about a week. The query frontend is good enough by now, just the indexer is crusty. We want a notion of file filtering and were thinking along the lines of mapping java.io.FileFilters onto some generic document indexer interface. The UI should offer some means of creating a list of these mappings, where first hit wins, probably with some notion of bouncing: if the file filter says to try an indexer, the indexer should still be able to throw an exception causing the mappings down the list to be tried. We haven't decided yet if we want to push or pull the information indexed (i.e. if the indexers write themself or if the management code asks them for some defaults and extras stored in Properties). We want implementations of this interface for at least: HTML, DOC, PDF, TXT; others would that would be good are: XLS, PPT, PS(.GZ), XML (incl. RDF, SVG), TeX, SX* (the OOo files). Another cool feature would be quering external meta-data sources. The result will be open sourced (BSD-style, as part of http://www.tockit.org). If there is interest in collaboration we will be happy to contribute the indexing parts directly into some Lucene repository. Most likely we will not spend much more time than next week on the project, since it is only a demonstrator for us. But we are happy to try to make parts of our code more reusable for other people -- in the hope that we might be able to use whatever your LARM turns into in case we get back to it one day. If you have concrete ideas please tell us, so we can adjust our designs. For those of you who are curious by now (I hope you don't mind the plug): there are cvsbuilds available which should run on any JRE 1.4+ installation. Grab the "Docco...." file from http://www.itee.uq.edu.au/~pbecker/ToscanaJ/cvsbuilds and feel free to send me any complaints if you don't like it :-) Regards, Peter Becker --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org