cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Graham <>
Subject Re: RefDoc Direction
Date Mon, 18 Jul 2005 14:49:54 GMT
> 1. Extract snippets from the various types of source files: XML, java,
> text

I feel that this is mostly complete, but I'm open to new suggestions.

> 2. Convert these snippets to an XML form that is easily indexable with
> Lucene, generating Lucene "fields" for all important pieces of
> information: snippet key, snippet type, title, etc.

This needs a little work. This represents the "single snippet" page
you had in the refdoc prototype if I'm not mistaken and currently they
don't contain enough information.

> 2b. Also generate "navigation documents" which Lucene will use to find
> all snippets. This is shown in the prototype already.

This seems mostly done, though I wonder if some of the links generated
will work as is for indexing. For example one set of the "a" tags has
the href="[@id]" or something like href="snippet_31". Can the
crawler/indexer sort that out?

> 3. Crawl and index the generated XML documents with Lucene, at first
> using the Lucene block out of the box, I assume. Some manual work (like
> starting the index creation from an URL) is ok at this stage, we're
> trying to demonstrate the full chain before implementing everyting.

In the works. I might write some Java code for indexing and searching
soon, but I'll keep it skeletal until I feel good about it.

> 4. Create the required Lucene queries to put together snippets coming
> from different source files but having the same key (e.g. all
> "FileGenerator" snippets). I might need to add @doktor stuff to
> existing code and samples so that you can see better how this should
> work.

Future work.

> 5. Transform the results of these queries to XML document in a
> publication-neutral format, where one document contains all the info
> and code excerpts provided by snippets having the same key.

Should we also retain the ability for a user-based query that could
dynamically publish a document on their query?

That sounds about like what I have in my notes. Thanks for walking
through it. I came to many of those conclusions in woring through the
prototype, but on some of them the precision was nebulous. I was also
more stuck on where to go from the TODOs at the time, but found a
direction to keep moving in.


View raw message