cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Maisonneuve" <nico.maisonne...@free.fr>
Subject Re: Extend Lucene sample to RDBMS?
Date Thu, 04 Nov 2004 02:45:19 GMT
ok to add a uri attribute to document tag (in fact my new version use this
feature , these version is old but i can update with the new version)

With the official LuceneIndexTransformer we can't  specify user field name
and type field (very important to indexation process):
-text (Tokenized, indexed)
-keyword (not Tokenized,indexed)
-date (not Tokenized,indexed , allow date special search) (note: the date
type is not available in the official luceneIndexTransformer)

so we can't really index XML data: Imagine we have to index XML document
with this stucture:

<name>Nicolas Maisonneuve</name>
<date>03/11/979</date>
<keywords>
    <keyword>keyword1</keyword>
    <keyword>keyword3</keyword>
    <keyword>keyword2</keyword>
<descriptions>
    <description1>1qsd  qsdqs dsqd</description1>
    <description1>2qsd  qsdqs dsqd</description1>
    <description2> qsd qdsq dqs dsq </description2>
</descriptions>

with mine you can index this kind of document with  XSL , the result can be
for example:
....
 <lucene:document uri="http://cocoon/mydocument.xml>
        <lucene:field name="name" type="text">Nicolas
Maisonneuve</lucene:field>
        <lucene:field name="keyword" type="keyword">keyword1</lucene:field>
        <lucene:field name="keyword" type="keyword">keyword2</lucene:field>
        <lucene:field name="keyword" type="keyword">keyword3</lucene:field>
        <lucene:field name="description1" type="keyword">1qsd  qsdqs
dsqd</lucene:field>
        <lucene:field name="description2" type="keyword">2qsd  qsdqs
dsqd</lucene:field>
        <lucene:field name="date" type="date"
dateformat="MM/dd/yyyy">11/03/1979</lucene:field>     </lucene:document>

Nicolas Maisonneuve
----- Original Message ----- 
From: "Conal Tuohy" <Conal.Tuohy@vuw.ac.nz>
To: <users@cocoon.apache.org>
Sent: Thursday, November 04, 2004 02:55
Subject: RE: Extend Lucene sample to RDBMS?


Nicolas Maisonneuve wrote:

> see also a different version of LuceneIndexTransformer
> index XML data and delete available
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=107821889332237&w=2

Indexing XML data is not new. The current LuceneIndexTransformer does this
already.

The current version doesn't do deletion quite the same way, but it will
over-write an existing record, which allows for "incremental update". This
is almost the same as the deletion feature in the variant version, given
that you can specify a <lucene:document> element with no content. It looks
to me as if with the variant version you can either delete a record, or add
it, but not update a record.

The big difference is that the variant version does not recognise documents
by URI - a document does not necessarily have a unique key - it simply has a
collection of fields. One or more of these fields may be a URI, but it may
not. So it will allow you to add 2 records with the same URI, whereas the
current version identifies each record (document) with a URI, and it
automatically deletes the any record with a given URI before adding a new
record with that URI.

However, the variant version does have some really good features (such as
the "boost" factor) which are certainly worth having.

I think it would be good to merge the two, actually!

Con

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message