cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Maisonneuve" <>
Subject Re: Extend Lucene sample to RDBMS?
Date Thu, 04 Nov 2004 02:45:19 GMT
ok to add a uri attribute to document tag (in fact my new version use this
feature , these version is old but i can update with the new version)

With the official LuceneIndexTransformer we can't  specify user field name
and type field (very important to indexation process):
-text (Tokenized, indexed)
-keyword (not Tokenized,indexed)
-date (not Tokenized,indexed , allow date special search) (note: the date
type is not available in the official luceneIndexTransformer)

so we can't really index XML data: Imagine we have to index XML document
with this stucture:

<name>Nicolas Maisonneuve</name>
    <description1>1qsd  qsdqs dsqd</description1>
    <description1>2qsd  qsdqs dsqd</description1>
    <description2> qsd qdsq dqs dsq </description2>

with mine you can index this kind of document with  XSL , the result can be
for example:
 <lucene:document uri="http://cocoon/mydocument.xml>
        <lucene:field name="name" type="text">Nicolas
        <lucene:field name="keyword" type="keyword">keyword1</lucene:field>
        <lucene:field name="keyword" type="keyword">keyword2</lucene:field>
        <lucene:field name="keyword" type="keyword">keyword3</lucene:field>
        <lucene:field name="description1" type="keyword">1qsd  qsdqs
        <lucene:field name="description2" type="keyword">2qsd  qsdqs
        <lucene:field name="date" type="date"
dateformat="MM/dd/yyyy">11/03/1979</lucene:field>     </lucene:document>

Nicolas Maisonneuve
----- Original Message ----- 
From: "Conal Tuohy" <>
To: <>
Sent: Thursday, November 04, 2004 02:55
Subject: RE: Extend Lucene sample to RDBMS?

Nicolas Maisonneuve wrote:

> see also a different version of LuceneIndexTransformer
> index XML data and delete available

Indexing XML data is not new. The current LuceneIndexTransformer does this

The current version doesn't do deletion quite the same way, but it will
over-write an existing record, which allows for "incremental update". This
is almost the same as the deletion feature in the variant version, given
that you can specify a <lucene:document> element with no content. It looks
to me as if with the variant version you can either delete a record, or add
it, but not update a record.

The big difference is that the variant version does not recognise documents
by URI - a document does not necessarily have a unique key - it simply has a
collection of fields. One or more of these fields may be a URI, but it may
not. So it will allow you to add 2 records with the same URI, whereas the
current version identifies each record (document) with a URI, and it
automatically deletes the any record with a given URI before adding a new
record with that URI.

However, the variant version does have some really good features (such as
the "boost" factor) which are certainly worth having.

I think it would be good to merge the two, actually!


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message