lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Quaroni <dquar...@OPENRATINGS.com>
Subject Indexing documents with multiple values for 1 field
Date Thu, 14 Aug 2003 15:58:14 GMT
I saw a post that sort of touched on my question, I think, but it didn't
seem quite the same...

What's the best way to index a document with multiple values for the same
field?  I'm trying to optimize search time and accuracy.

We have a database of companies that we want to be able to search on, and
the fields will include company name, address, and telephone number.  Some
companies have more than one name, though.  For example, BMG is also known
as Bertelsmann Music Group.  Our users need to be able to search on either
of these names and find a match.  In our raw data, these different names are
in separate fields for alternate names...  But which is a better way to
implement this in Lucene:

A) Duplicate documents by using all the same data except for the name (i.e.
1 document for BMG at 123 fake street and 1 document for Bertelsmann Music
Group at 123 fake street)

B) Create 5 fields for alternate names (Which 80% of companies don't have at
all so they'd be empty) and then when doing a search query, search for the
same thing across all 6 fields?  (i.e. name:BMG OR altname1:BGM OR
altname2:BMG... etc)

C) Put all of the altername names together into the name field (i.e. BMG
Bertelsmann Music Group).  Is there anything to delimit the different names
with so that they would be treated as separate entities?

Mime
View raw message