lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Indexing multiple instances of the same field for each document
Date Sun, 29 Feb 2004 14:09:57 GMT
What you are doing is really the job of an Analyzer.  You are doing 
pre-analysis, when instead you could do all of this within the context 
of a custom analyzer and avoid many of these issues altogether.

Do you use the XML only during indexing?  If so, you could bypass the 
whole conversion to XML and then back through Digester all within an 
analyzer.

Or am I missing something that prevents you from doing it this way?

	Erik


On Feb 28, 2004, at 10:05 PM, Roy Klein wrote:
> Erik,
> Here's a brief example of the type of thing I'm trying to do:
>
> I have a file that contains the words:
>
> The quick brown fox jumped over the lazy dog.
>
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
>
> I parse that document (via the digester), and add all the words from
> each of the fields to one lucene field: "contents".  The tricky part is
> that I want to have each word position contain all the words at that
> position in the lucene index.  I.e. word location 1 in the index
> contains "The", word location 2: "quick, fast, and speedy", word
> location 3: "brown, tan, and dark", etc.
>
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
>
> I wrote a "TermAnalyzer" that adds all the words from a field into the
> index at the same position. (via setPositionIncrement(0)).  That way I
> can simply add each set of words to the "contents" field, and it'll 
> just
> keep adding them to the same field.  However, since it's reversing 
> them,
> I can't match phrases.
>
>
>     Roy


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message