lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy Klein" <kl...@sitescape.com>
Subject RE: Indexing multiple instances of the same field for each document
Date Sun, 29 Feb 2004 13:18:23 GMT
Hi Markus,

What you're saying would work if I wasn't concerned about query
performance.

If I add the synonym's at document index time, then I only process the
word "quick" once (when I insert the doc into the index).

If I process each query to convert "fast" and "speedy" to "quick" at
query time, then I might wind up processing those words millions of
times. (once for each query)   Yes, I could come up with a cache so that
the processing is at a minimum, however, it still makes more sense to do
it once, at index time.

    Roy

-----Original Message-----
From: Markus Spath [mailto:mspath@arcor.de] 
Sent: Sunday, February 29, 2004 5:45 AM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


Roy Klein wrote:

> Erik,
> 
> Indexing a single field in chunks solves a design problem I'm working 
> on. It's not the only way to do it, but, it would certainly be the 
> most straightforward.  However, if using this method makes phrase 
> searching unusable, then I'll have to go another route.
> 

hmm, wouldn't it be easier to index only one term for a list of synomys
instead 
of indexing each synonym for one term?

quick, fast, speedy -> quick (both when building the index and building
the query)

this also would solve your problems with the (somehow counterintuative
but 
probably well reasoned) behaviour of lucene to add Fields with the same
name at 
the beginning instead of appending them.


Markus

> Here's a brief example of the type of thing I'm trying to do:
> 
> I have a file that contains the words:
> 
> The quick brown fox jumped over the lazy dog.
> 
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
> 
> I parse that document (via the digester), and add all the words from 
> each of the fields to one lucene field: "contents".  The tricky part 
> is that I want to have each word position contain all the words at 
> that position in the lucene index.  I.e. word location 1 in the index 
> contains "The", word location 2: "quick, fast, and speedy", word 
> location 3: "brown, tan, and dark", etc.
> 
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message