lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 车 东 <ched...@hotmail.com>
Subject Re: Lucene and XML
Date Thu, 31 Oct 2002 07:00:24 GMT
here is a demo: 
XMLIndexer/StringFilter
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02276.html

make one lucene.dtd(or schema) as the common lucene 
>indexing source format:
> source WORD       PDF     HTML    DB       other
>          \          |       |      |         /
>                        xml(lucene.dtd) 
>                             |
>                    XMLIndexer.build(XML InputSource)
>                             |
>                      Lucene INDEX
here is a demo indexing source:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
 - sample xml index source, user want index these records in as following:
 - store 'title' and 'author' field for output
 - indexing 'title + content + author' for full text search
 - indexing 'category' field without token for category filtering
 -
 - Author: Che, Dong <chedong@bigfoot.com>
-->
<!DOCTYPE note SYSTEM "lucene.dtd">
<Records>
    <Record>
        <Field name="title">my title one</Field>
        <Field name="category">computer.internet.</Field>
        <Field name="author">John</Field>
        <Field name="content" store="no">some content bula bula 
asdf</Field>
        <Index name="idx_all">title,content,author</Index>
        <Index name="idx_author" token="no">category</Index>
    </Record>
    <Record>
        <Field name="title">my title two</Field>
        <Field name="category">computer.game</Field>
        <Field name="author">Jack</Field>
        <Field name="content" store="no">some content bula bula 
asdf</Field>
        <Index name="idx_all">title,content,author</Index>
        <Index name="idx_author" token="no">category</Index>
    </Record>
    <Record>
        <Field name="title">my title three</Field>
        <Field name="category">art.music</Field>
        <Field name="author">Jerry</Field>
        <Field name="content" store="no">some content bula bula 
asdf</Field>
        <Index name="idx_all">title,content,author</Index>
        <Index name="idx_author" token="no">category</Index>
    </Record>
    <Record>
        <Field name="title">my title four</Field>
        <Field name="category">sports.badminton</Field>
        <Field name="author">Tom</Field>
        <Field name="content" store="no">some content bula bula 
asdf</Field>
        <Index name="idx_all">title,content,author</Index>
        <Index name="idx_author" token="no">category</Index>
    </Record>
</Records>
Che, Dong
>From: "Rob Outar" <routar@ideorlando.org>
>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>Subject: Lucene and XML 
>Date: Wed, 30 Oct 2002 08:57:47 -0500
>
>Hello all,
>
>	I did not know there were packages like ISOGEN that used Lucene to build 
a
>searchable index based on XML files.  From visiting ISOGEN's website it
>looks like it is a commercial software, are there any open source 
extensions
>to Lucene that allow XML indexing and searching?
>
>	Please let me know.
>
>Thanks again,
>
>Rob
>
>
>--
>To unsubscribe, e-mail:   
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: 
<mailto:lucene-user-help@jakarta.apache.org>


_________________________________________________________________
免费下载 MSN Explorer:  http://explorer.msn.com/lccn/


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message