Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 91603 invoked from network); 3 Jun 2009 01:23:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jun 2009 01:23:47 -0000 Received: (qmail 34624 invoked by uid 500); 3 Jun 2009 01:23:58 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 34530 invoked by uid 500); 3 Jun 2009 01:23:58 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 34513 invoked by uid 99); 3 Jun 2009 01:23:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 01:23:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.190.38.61] (HELO web50307.mail.re2.yahoo.com) (206.190.38.61) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 03 Jun 2009 01:23:47 +0000 Received: (qmail 44308 invoked by uid 60001); 3 Jun 2009 01:23:26 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1243992206; bh=eU0KRPitnOJgnB1jMNlamWVVMNUNc4XpLHCD9FiCjX8=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=FtlQGJfj2a/l2+wML1x2zIx74mUonkKBITruZ5gxXjhm6GgtbqL9+xg6uU7XPRmbWMOz+chHYpm2Sa3iNMWjPal5miqsbzeVdaOC2fkN8nX0/wMQ/RdLgCtW/pEMBw188cTQcymdvlD2+pFZ5nISmxnkPmqLfHyJ/C/PQ6U8P/U= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=Xw1RQiklv1Y/1JPaKm99iQ24r147lTDqPr7ATz1wOTgmSKfmwEywy96NT9APS+PGmW3S0v8/49qUsqAbYkRcgRIXFn/HN5hgVN1GQT0bgGRh1M2x4S69kIPAa/SlEWiqrwsihdS3tpg5qenAb5mdzfbuC0tsROyj3x4ntUOOTJM=; Message-ID: <196551.44283.qm@web50307.mail.re2.yahoo.com> X-YMail-OSG: Zh2ZGOEVM1kwRuZIRYMVc3WrjI6kT0YFscrYOMIeodTTsRCpZ.77fvVYNz1mHUj_pNKlBlrHjaxsrBVD_O7myrejm5EimII0CkydS_NMbFWXixVk_i0nciJUcJEDtl8ilSQl2ZXxTDVtnr_8Y12mTM0zpyC4zZjL5cPPJr3u3mzBTRg63ERUSNCNycrz8QqXhhjNb.I0m9OKq9Co1XPnczWOFm6xZcpvrPyY3E_Wwb3mJG1KbuGcKBxIJQktduZ.ODIj2B3X.6qw.NXH6IYdgzywvvrKQekIwGnj_4TcUhcqx.cCONbk8Yjuv7MXT5BEYZOfX6eM Received: from [74.73.31.128] by web50307.mail.re2.yahoo.com via HTTP; Tue, 02 Jun 2009 18:23:25 PDT X-Mailer: YahooMailRC/1277.43 YahooMailWebService/0.7.289.10 References: <23842527.post@talk.nabble.com> Date: Tue, 2 Jun 2009 18:23:25 -0700 (PDT) From: Otis Gospodnetic Subject: Re: Is there Downside to a huge synonyms file? To: solr-user@lucene.apache.org In-Reply-To: <23842527.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org Hi, If index-time synonym expansion/indexing is used, then a large synonym file means your index is going to be bigger. If query-time synonym expansion is used, then your queries are going to be larger (i.e. more ORs, thus a bit slower). How much, it really depends on your specific synonyms, so I can't generalize. I have a feeling you are not dealing with millions of documents, in which case you can most likely ignore increase in index or query size. Adding synonyms sounds like the easiest approach. I'd try that and worry about improvement only IF I see that doesn't give adequate results. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: anuvenk > To: solr-user@lucene.apache.org > Sent: Tuesday, June 2, 2009 6:55:27 PM > Subject: Is there Downside to a huge synonyms file? > > > In my index i have legal faqs, forms, legal videos etc with a state field for > each resource. > Now if i search for real estate san diego, I want to be able to return other > 'california' results i.e results from san francisco. > I have the following fields in the index > > title state > description... > real estate san diego example 1 california some > description > real estate carlsbad example 2 california some desc > > so when i search for real estate san francisco, since there is no match, i > want to be able to return the other real estate results in california > instead of returning none. Because sometimes they might be searching for a > real estate form and city probably doesn't matter. > > I have two things in mind. One is adding a synonym mapping > san diego, california > carlsbad, california > san francisco, california > > (which probably isn't the best way) > hoping that search for san francisco real estate would map san francisco to > california and hence return the other two california results > > OR > > adding the mapping of city to state in the index itself like.. > > title state city > > description... > real estate san diego eg 1 california carlsbad, san francisco, san > diego some description > real estate carlsbad eg 2 california carlsbad, san francisco, san > diego some description > > which of the above two is better. Does a huge synonym file affect > performance. Or Is there a even better way? I'm sure there is but I can't > put my finger on it yet & I'm not familiar with java either. > > -- > View this message in context: > http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23842527.html > Sent from the Solr - User mailing list archive at Nabble.com.