lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy Klein" <kl...@sitescape.com>
Subject RE: Indexing multiple instances of the same field for each document
Date Mon, 01 Mar 2004 19:10:25 GMT
I don't have access to the process that created the XML, it was done in
the past.

As I stated in the beginning of this thread, this is just an example of
the type of thing I'm trying to accomplish.

I think the real issue herein is that the fields are being inserted in
reverse order.  Here's the comments in the code (for Document.add()):

  /** Adds a field to a document.  Several fields may be added with
   * the same name.  In this case, if the fields are indexed, their text
is
   * treated as though appended for the purposes of search. */

I guess it doesn't specify the order they're appended, however, when I
read that comment, I thought that it meant "in the order added".  It's a
pretty simple change to the Document class to make this work as I'd
expect it.  From Doug's initial response, I think he expected this
behavior as well.


Thanks again for all your help!

    Roy


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Sunday, February 29, 2004 9:10 AM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


What you are doing is really the job of an Analyzer.  You are doing 
pre-analysis, when instead you could do all of this within the context 
of a custom analyzer and avoid many of these issues altogether.

Do you use the XML only during indexing?  If so, you could bypass the 
whole conversion to XML and then back through Digester all within an 
analyzer.

Or am I missing something that prevents you from doing it this way?

	Erik


On Feb 28, 2004, at 10:05 PM, Roy Klein wrote:
> Erik,
> Here's a brief example of the type of thing I'm trying to do:
>
> I have a file that contains the words:
>
> The quick brown fox jumped over the lazy dog.
>
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
>
> I parse that document (via the digester), and add all the words from 
> each of the fields to one lucene field: "contents".  The tricky part 
> is that I want to have each word position contain all the words at 
> that position in the lucene index.  I.e. word location 1 in the index 
> contains "The", word location 2: "quick, fast, and speedy", word 
> location 3: "brown, tan, and dark", etc.
>
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
>
> I wrote a "TermAnalyzer" that adds all the words from a field into the

> index at the same position. (via setPositionIncrement(0)).  That way I

> can simply add each set of words to the "contents" field, and it'll 
> just keep adding them to the same field.  However, since it's 
> reversing them,
> I can't match phrases.
>
>
>     Roy


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message