lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Becker <pbec...@dstc.edu.au>
Subject LARM: status? / File System Indexer
Date Fri, 27 Jun 2003 03:16:38 GMT
Hi all,

Andrew already forwarded one of my mails on the list, so you might know 
what I am looking for by now. Maybe some more words as clarifications:

What we are doing is writing a personal document management tool based 
on Lucene and our visualization techniques. Actually I should say: what 
we have done, the only problem is that indexing is still a big hack. The 
plan we made to do it right was pretty much what Andrew described in his 
website and by now I have found the LARM descriptions here and there. In 
a way this framework is bigger than we are aiming for (we care only 
about scenario 1.1 - File System Indexer in term of the LARM 
documentation), but we would be happy to try to collaborate in the effort.

Here is the scenario: we are two experienced Java developers trying to 
get our demonstrator up and going in about a week. The query frontend is 
good enough by now, just the indexer is crusty. We want a notion of file 
filtering and were thinking along the lines of mapping 
java.io.FileFilters onto some generic document indexer interface. The UI 
should offer some means of creating a list of these mappings, where 
first hit wins, probably with some notion of bouncing: if the file 
filter says to try an indexer, the indexer should still be able to throw 
an exception causing the mappings down the list to be tried. We haven't 
decided yet if we want to push or pull the information indexed (i.e. if 
the indexers write themself or if the management code asks them for some 
defaults and extras stored in Properties). We want implementations of 
this interface for at least: HTML, DOC, PDF, TXT; others would that 
would be good are: XLS, PPT, PS(.GZ), XML (incl. RDF, SVG), TeX, SX* 
(the OOo files). Another cool feature would be quering external 
meta-data sources.

The result will be open sourced (BSD-style, as part of 
http://www.tockit.org). If there is interest in collaboration we will be 
happy to contribute the indexing parts directly into some Lucene 
repository. Most likely we will not spend much more time than next week 
on the project, since it is only a demonstrator for us. But we are happy 
to try to make parts of our code more reusable for other people -- in 
the hope that we might be able to use whatever your LARM turns into in 
case we get back to it one day. If you have concrete ideas please tell 
us, so we can adjust our designs.

For those of you who are curious by now (I hope you don't mind the 
plug): there are cvsbuilds available which should run on any JRE 1.4+ 
installation. Grab the "Docco...." file from 
http://www.itee.uq.edu.au/~pbecker/ToscanaJ/cvsbuilds and feel free to 
send me any complaints if you don't like it :-)

Regards,
   Peter Becker


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message