cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Glorieux <frederic.glori...@ajlsm.com>
Subject Re: [RT] doco lite?
Date Tue, 26 Oct 2004 20:56:12 GMT



> G3. Index the latest version of the docs, including structured fields 
> (keywords, target audience, components mentioned, etc), to implement 
> "prepared queries" (as links, simply) to improve our docs' accessibility

It seems a good occasion to provide a better LuceneTransformer 
implementation for cocoon ?

> T2. Build an index with Lucene, triggered via SVN post-commit hooks, 
> uses a live Cocoon instance to generate an easy to index XML document 
> for Lucene. Include metadata fields as mentioned in G2 above, generated 
> from (enhanced as compared to now) document content

I'm working on that (nto enough), it seems to me only a consequence of 
upper.

<map:match pattern="**">
   <map:generate src="{folder}{1}"/>
   <map:transform src="myschema2lucene.xsl"/>
   <map:transform type="lucene"/>
   <map:transform src="myschema2html.xsl"/>
   <map:serialize/>
</map:match>

myschema2lucene.xsl handle the original doc, let everything pass but add 
something for indexation

<root>
   <!-- the doc to index -->
   <lucene:document>
     <lucene:field name="uri">
<!-- ... -->
     </lucene:field>
     <lucene:field name="fulltext" store="false">
<!-- result of myschema2txt.xsl -->
     </lucene:field>
     <lucene:field name="keyword" tokenize="false">
<!-- keep a field with not tokenized keywords to have them as lists -->
     </lucene:field>
<!-- ... -->
   <lucene:document>
   <!-- the structured doc -->
</root>

The Lucene transformer handle <lucene:*/> and let other things go for 
publish.

If generator@src haven't changed, should be cached, so not too much 
transform and indexation, if not, index is update.

For delete, a hook from SVN is needed.

> T4. Use queries like "find all documents which talk about sitemap 
> matchers" to build navigation pages semi-automatically.

After some experience of cocoon with lucene, don't forget list of terms 
(from untokenized fields), because it allows you to have the list of 
existing keywords (for example), so that you can generate your queries 
on what you have in your docs (instead of constraints on vocabularies 
for production).

> T5. Put mod_cache in front to minimize server load (HTTP POST can be 
> used to invalidate pages if quick updates are needed to check edits).

You give me the trick for something that I was asking to Sylvain, a kind 
of pure cocoon mod_cache with <map:act type="copy-source"/>

http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=109636070505876&w=2

problem was update. The cocoon app produces a regular www folder 
directly served for public by an httpd, update could be a consequence of 
SVN hook, and il nothing works, there's still the handling of HTTP POST 
to force update from www.

Frédéric.

-- 
Frédéric Glorieux (ingénieur documentaire, AJLSM)
<http://www.ajlsm.com>

Mime
View raw message