cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Collen <>
Subject Re: How to create an Index file by XML
Date Thu, 17 Apr 2003 04:57:52 GMT
On Thu, 17 Apr 2003, Zhang Newman-r53609 wrote:

> Hello,everyone,
> Who can give me some ideas about how to create an Index file. Pls see the following:
> Precondition, Now,have a lof of XML files in my local computer which composed of header
and body. Have some keywords in the header.
> I want to create an index file by XML convenient for to be resolved that find one of
> Finally,I will develop a program by XSP to queried this XML file through some parameters
of keyword,and then get the body of local XML file.

Hi Newman,

there's a few ways you can go about this...

one way would be to index your file using lucene, but i'm not sure if you
can control where lucene pulls keywords from and indexes them.

the other way that i could think of is to use the directorygenerator and
some clever use of transformation and cinclude.  it's a little late right
now, so my brain isn't workign fully, but the algorithm might look
something like this:

in one pipeline:
-use the directorygenerator to create an index of files
-transform the output to have the keywords in each file

in another pipeline:
-get the requested keyword
-start generating sax events based on the first pipeline
-transform the sax events, filtering the index based on where the
requested keyword appears in the index
-based on the list of documents which have the requested keyword, pull out
their body content
-transform the content to the desired output format

I'd type out some pipelines, but i haven't had to use cocooon in such a
complex way.  then again, lucene might be able to help you out, albeit
with some changes to how you'd do things. check out the searchgenerator.

if i'm bored enough tomorrow i might check out how i'd do this without the



Tony Collen
ICQ: 12410567
Cocoon: Internet Glue (A Cocoon Weblog)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message