cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Delacretaz <>
Subject Re: RefDoc Direction
Date Mon, 18 Jul 2005 14:58:01 GMT
Le 18 juil. 05, à 16:49, Robert Graham a écrit :

>> 1. Extract snippets from the various types of source files: XML, java,
>> text
> I feel that this is mostly complete, but I'm open to new suggestions.

ok - what's in the prototype is probably good enough for now.

>> 2. Convert these snippets to an XML form that is easily indexable with
>> Lucene, generating Lucene "fields" for all important pieces of
>> information: snippet key, snippet type, title, etc.
> This needs a little work. This represents the "single snippet" page
> you had in the refdoc prototype if I'm not mistaken and currently they
> don't contain enough information.

Right, we certainly need more fields.

>> 2b. Also generate "navigation documents" which Lucene will use to find
>> all snippets. This is shown in the prototype already.
> This seems mostly done, though I wonder if some of the links generated
> will work as is for indexing. For example one set of the "a" tags has
> the href="[@id]" or something like href="snippet_31". Can the
> crawler/indexer sort that out?

href="snippet_31" looks Ok as a relative URL.

[@id] is probably an XSLT typo, should be {@id} instead to generate a 
dynamic link

>> 3. Crawl and index the generated XML documents with Lucene, at first
>> using the Lucene block out of the box, I assume. Some manual work 
>> (like
>> starting the index creation from an URL) is ok at this stage, we're
>> trying to demonstrate the full chain before implementing everyting.
> In the works. I might write some Java code for indexing and searching
> soon, but I'll keep it skeletal until I feel good about it.


>> 4. Create the required Lucene queries to put together snippets coming
>> from different source files but having the same key (e.g. all
>> "FileGenerator" snippets). I might need to add @doktor stuff to
>> existing code and samples so that you can see better how this should
>> work.
> Future work.


>> 5. Transform the results of these queries to XML document in a
>> publication-neutral format, where one document contains all the info
>> and code excerpts provided by snippets having the same key.
> Should we also retain the ability for a user-based query that could
> dynamically publish a document on their query?...

Probably useful, you can maybe leave this open and we'll see which 
queries are useful.

> ... was also
> more stuck on where to go from the TODOs at the time, but found a
> direction to keep moving in...

Cool, thanks for your work! According to the ongoing vote you should be 
able to get access soon to commit your work, in the meantime if you 
want to put a patch in bugzilla I'll take care of it.


View raw message