lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Is there Downside to a huge synonyms file?
Date Wed, 03 Jun 2009 04:28:18 GMT

Hello,

300K is a pretty small index.  I wouldn't worry about the number of synonyms unless you are
turning a single term into dozens of ORed terms.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: anuvenk <anuvenkatesh@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 2, 2009 11:28:43 PM
> Subject: Re: Is there Downside to a huge synonyms file?
> 
> 
> I'm using query time synonyms. I have more fields in my index though. This is
> just an example or sample of data from my index. Yes, we don't have millions
> of documents. Could be around 300,000 and might increase in future. The
> reason i'm using query time synonyms is because of the nature of my data. I
> can't re-index the data everytime i add or remove a synonym. But for this
> particular requirement is it best to have index time synonyms because of the
> multi-word synonym nature. Again if i add more cities list to the synonym
> file, I can't be re-indexing all the data over and over again. 
> 
> 
> 
> anuvenk wrote:
> > 
> > In my index i have legal faqs, forms, legal videos etc with a state field
> > for each resource.
> > Now if i search for real estate san diego, I want to be able to return
> > other 'california' results i.e results from san francisco.
> > I have the following fields in the index
> > 
> > title                                                  state          
> > description...
> > real estate san diego example 1           california         some
> > description
> > real estate carlsbad example 2             california         some desc
> > 
> > so when i search for real estate san francisco, since there is no match, i
> > want to be able to return the other real estate results in california
> > instead of returning none. Because sometimes they might be searching for a
> > real estate form and city probably doesn't matter. 
> > 
> > I have two things in mind. One is adding a synonym mapping
> > san diego, california
> > carlsbad, california
> > san francisco, california
> > 
> > (which probably isn't the best way)
> > hoping that search for san francisco real estate would map san francisco
> > to california and hence return the other two california results
> > 
> > OR
> > 
> > adding the mapping of city to state in the index itself like..
> > 
> > title                                         state             city          
>                         
> > description...
> > real estate san diego eg 1    california   carlsbad, san francisco, san
> > diego        some description
> > real estate carlsbad eg 2      california   carlsbad, san francisco, san
> > diego        some description
> > 
> > which of the above two is better. Does a huge synonym file affect
> > performance. Or Is there a even better way? I'm sure there is but I can't
> > put my finger on it yet & I'm not familiar with java either.
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23844761.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message