lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Spath <msp...@arcor.de>
Subject Re: Indexing multiple instances of the same field for each document
Date Sun, 29 Feb 2004 10:45:20 GMT
Roy Klein wrote:

> Erik,
> 
> Indexing a single field in chunks solves a design problem I'm working
> on. It's not the only way to do it, but, it would certainly be the most
> straightforward.  However, if using this method makes phrase searching
> unusable, then I'll have to go another route.
> 

hmm, wouldn't it be easier to index only one term for a list of synomys instead 
of indexing each synonym for one term?

quick, fast, speedy -> quick (both when building the index and building the query)

this also would solve your problems with the (somehow counterintuative but 
probably well reasoned) behaviour of lucene to add Fields with the same name at 
the beginning instead of appending them.


Markus

> Here's a brief example of the type of thing I'm trying to do:
> 
> I have a file that contains the words:
> 
> The quick brown fox jumped over the lazy dog.
> 
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
> 
> I parse that document (via the digester), and add all the words from
> each of the fields to one lucene field: "contents".  The tricky part is
> that I want to have each word position contain all the words at that
> position in the lucene index.  I.e. word location 1 in the index
> contains "The", word location 2: "quick, fast, and speedy", word
> location 3: "brown, tan, and dark", etc.
> 
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message