Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 86295 invoked from network); 21 Nov 2003 19:59:05 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 21 Nov 2003 19:59:05 -0000 Received: (qmail 34641 invoked by uid 500); 21 Nov 2003 19:58:50 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 34604 invoked by uid 500); 21 Nov 2003 19:58:49 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 34589 invoked from network); 21 Nov 2003 19:58:49 -0000 Received: from unknown (HELO hotmail.com) (65.54.245.154) by daedalus.apache.org with SMTP; 21 Nov 2003 19:58:49 -0000 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Fri, 21 Nov 2003 11:58:54 -0800 Received: from 63.204.205.222 by by1fd.bay1.hotmail.msn.com with HTTP; Fri, 21 Nov 2003 19:58:54 GMT X-Originating-IP: [63.204.205.222] X-Originating-Email: [tamputampu@hotmail.com] From: "sam s" To: lucene-user@jakarta.apache.org Bcc: Subject: RE: Context-based suggestions with spell check Date: Fri, 21 Nov 2003 19:58:54 +0000 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 21 Nov 2003 19:58:54.0637 (UTC) FILETIME=[E464C5D0:01C3B069] X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N You are right, just for search it won�t take much time. I have relatively small index. Almost all documents change daily. So one way I am thinking of updating index is creating new index set in a temp directory by other process. When time comes to update index then renaming in-use dir to archive dir and renaming temp dir to in-use dir. I am not sure whether this is a good idea, what do you say? I have overall 70000 docs with around 20 fields where 3 fields contain more text. I have indexed and tokenized 11 fields. Average size of data I put in a document is 2 kb. Thanks >From: Dan Quaroni >Reply-To: "Lucene Users List" >To: 'Lucene Users List' >Subject: RE: Context-based suggestions with spell check >Date: Fri, 21 Nov 2003 09:45:13 -0500 > >There are some important questions to ask before you discount performing >all >of those searches. What are your performance requirements? How big is >your >index? How are you deploying it? > >-----Original Message----- >From: sam s [mailto:tamputampu@hotmail.com] >Sent: Thursday, November 20, 2003 8:15 PM >To: lucene-user@jakarta.apache.org >Subject: RE: Context-based suggestions with spell check > > >Levenshtein is again word based like spell check. I found Jazzy quite handy >to some level for spell check. I am not worrying much about the spell check >part. What I want to do is show user right spell check suggestion (when >spell check returns multiple suggestions) based on other words he/she >entered for the search. >Once again going to same example >User enters: inted motherboard >spell check returns 3 suggestions for inted >inter >intel >intek > >In this situation since I have both words intel and motherboard in one >document of my search collection I should able to show user something like >Did you mean: intel motherboard? > >One simplest way to achieve this is do search 3 times for all three >suggestions with word motherboard and show user suggestion for which search >got more hits. Problem with this is number of iterations involved. If there >are suggestions on two words user entered, there will be all kinds of >combinations and those many iterations. So I dont want to go this way. > >I haven't studied in detail how lucene does indexing and search on it. I >don't know whether that will help. > >Has anybody come across this problem? Or I must be missing something.. > >Again, I apologize if you guys think this is not right post for lucene user >list. > >Thanks, >Abhay > > > >From: "sam s" > >Reply-To: "Lucene Users List" > >To: lucene-user@jakarta.apache.org > >Subject: RE: Context-based suggestions with spell check > >Date: Fri, 21 Nov 2003 00:35:13 +0000 > > > >I actually thought of using search for right combination of suggestions >but > > >I feared of performance degrade. I'll look at levenshtein. > > > >Thanks > > > >>From: Dan Quaroni > >>Reply-To: Lucene Users List > >>To: 'sam s ' > >>Subject: RE: Context-based suggestions with spell check > >>Date: Thu, 20 Nov 2003 19:22:51 -0500 > >> > >> I would also suggest 'intend' as a possible correction. > >> > >>There are a decent number of algorithms out there for distance between >to > >>words. Check out levenshtein for that. > >> > >>In terms of context based corrections, you could do a search for the >word > >>combined with the word in front of it and the word behind it. > >> > >>"I just bought an inted motherboard" > >> > >>Then you do a search for "an inter", "an intel", etc and "inter > >>motherboard", "intel motherboard", etc and count the number of hits you > >>get > >>for each one and rank your suggestions accordingly. > >> > >> > >>-----Original Message----- > >>From: sam s > >>To: lucene-user@jakarta.apache.org > >>Sent: 11/20/03 7:07 PM > >>Subject: Context-based suggestions with spell check > >> > >>Hi, > >> > >>I am thinking to give spell check functionality to the search. I am > >>trying > >>to achieve two things to complement search. > >> > >>1. Spell check where dictionary will be composed of all text I am > >>creating > >>search index. This looks simple with some spell check implementation. > >> > >>2. The problem I am facing is how do I suggest right suggestion to a > >>wrong > >>word accompanied with other word. For example when user enters search > >>term > >>'inted' spell check returns suggestions inter, intel and intek. Now > >>problem > >>is when user searches 'inted motherboard' how do I decide that user is > >>searching for 'intel motherboard'? Where there are some items contain > >>text > >>'intel motherboard'. How do I make context-based suggestions? Does > >>anybody > >>any simple algorithm for this. > >>I know this is not related to lucene but thought may get some help from > >>community. Suggestions are appreciated. > >> > >>Thanks in advance, > >>Sam > >> > >>_________________________________________________________________ > >>Tired of spam? Get advanced junk mail protection with MSN 8. > >>http://join.msn.com/?page=features/junkmail > >> > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >> > > > >_________________________________________________________________ > >Add photos to your e-mail with MSN 8. Get 2 months FREE*. > >http://join.msn.com/?page=features/featuredemail > > > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > >_________________________________________________________________ >STOP MORE SPAM with the new MSN 8 and get 2 months FREE* >http://join.msn.com/?page=features/junkmail > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > _________________________________________________________________ Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org