Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 81290 invoked from network); 15 Apr 2009 06:58:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Apr 2009 06:58:57 -0000 Received: (qmail 72213 invoked by uid 500); 15 Apr 2009 06:58:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72120 invoked by uid 500); 15 Apr 2009 06:58:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72110 invoked by uid 99); 15 Apr 2009 06:58:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2009 06:58:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aminmc@gmail.com designates 209.85.220.158 as permitted sender) Received: from [209.85.220.158] (HELO mail-fx0-f158.google.com) (209.85.220.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2009 06:58:46 +0000 Received: by fxm2 with SMTP id 2so3129441fxm.5 for ; Tue, 14 Apr 2009 23:58:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=9/k1QapxgU80sobqgFwrNBnRA2EdgL6wYJgdGH/Rjyk=; b=fGw6cFXRM9+SPxXHLhbBjNWAazqROZfm7Ny944MsRuE8SU3zfS6LncmptNVBrEC4Mf zvwBS//6/mCNtq8vGseEJJdTNoLzsNFKjiWIxDeC/juwpDBjFxyJqDLSt7crl1IuAzRa BAFL4EdfaZ4c2SzTYplMfxVvKBRG69nWmN+wE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=oWysYJZ8gVMkctmAri4nh/e3T7s2mFzOEbiUSEwkOhw6Tt8FYWz88KwmIk/wk1EODJ unkpSajqUgQFL6zTOt4eVDccDdwmU1KLJQTflKtS3F6SSGstVfqgdq8h8ecKzlPYv2SV gC4/4oI2D8SA4rwWlb/+NwYkFBpEIZqc3LCcU= MIME-Version: 1.0 Received: by 10.204.61.130 with SMTP id t2mr7758003bkh.27.1239778704792; Tue, 14 Apr 2009 23:58:24 -0700 (PDT) In-Reply-To: <6f4104d80904111359p5621561bw2939160db9ba862b@mail.gmail.com> References: <6f4104d80904100628q79150b4oa3e5436cfb66c581@mail.gmail.com> <6f4104d80904111359p5621561bw2939160db9ba862b@mail.gmail.com> Date: Wed, 15 Apr 2009 07:58:24 +0100 Message-ID: <6f4104d80904142358v7854b805l985abe131bf765c2@mail.gmail.com> Subject: Re: SpellChecker in use with composite query From: Amin Mohammed-Coleman To: "java-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=0016e6dee7444c9dd90467927912 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dee7444c9dd90467927912 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Apologies for bringing this mail up again. But I have resolved some of the issues that I originally started with including composite queries. However I just have 1 remaining question which I would be grateful if someone could assist me with. I have a class whcih performs the creation of the spell index but I'm not sure where to apply this class. Do I apply this process whenever a user uploads a new file (kicking off the indexing process). It seems as though this may not be the most appropriate place as I have one spell index and 4 document indexes. I'm wondering what the general approach is. Also whenever the indexes change should I clear the spell index and start again? Once again apologies for bringing this up. Cheers Amin On Sat, Apr 11, 2009 at 9:59 PM, Amin Mohammed-Coleman wrote: > Hi > Another thing that I was wondering is how to apply the construction of the > spell index. Where is the most appropriate place to create the spell index? > > > For example: > > IndexReader spellReader = IndexReader.open(fsDirectory1); > > IndexReader spellReader2 = IndexReader.open(fsDirectory2); > > MultiReader multiReader = new MultiReader(new IndexReader[] > {spellReader,spellReader2}); > > LuceneDictionary luceneDictionary = new LuceneDictionary(multiReader, > "content"); > > Directory spellDirectory = FSDirectory.getDirectory( spellcheck); > > SpellChecker spellChecker = new SpellChecker(spellDirectory); > > spellChecker.indexDictionary(luceneDictionary); > > > should this be applied when doing a search or when a document is indexed? > Should I clear the spellIndex when the main index changes? > > > I also noticed that when running some tests I found that the spell index > contained numbers from the text extracted from a document. Is there a way > to only include a*lphabetic characters in the indexDictionary process?* > > > > Any help would be appreciated. > > > Cheers > > On Fri, Apr 10, 2009 at 2:28 PM, Amin Mohammed-Coleman wrote: > >> Hi >> I have been playing around with the SpellChecker class and so far it looks >> really good. While developing a testcase to show it working I came across a >> couple of issues which I have resolved but I'm not certain if this is the >> correct approach. I would therefore be grateful if anyone could tell me >> whether it is correct or I should try something else. >> >> 1) Multple Indexes: >> I have multiple indexes which store different documents based on certain >> subject matter. So inorder to perform the spellchecking against all indexes >> I did something like this: >> >> IndexReader spellReader = IndexReader.open(fsDirectory1); >> >> IndexReader spellReader2 = IndexReader.open(fsDirectory2); >> >> MultiReader multiReader = new MultiReader(new IndexReader[] >> {spellReader,spellReader2}); >> >> LuceneDictionary luceneDictionary = new LuceneDictionary(multiReader, >> "content"); >> >> Directory spellDirectory = FSDirectory.getDirectory(> spellcheck); >> >> SpellChecker spellChecker = new SpellChecker(spellDirectory); >> >> spellChecker.indexDictionary(luceneDictionary); >> >> >> Is this an acceptable approach or should there be a spellcheck index for >> each seperate document index? >> >> >> >> 2) Composite query e.g. Luciene OR doqument >> >> Inorder to handle the above i did the following: >> >> >> QueryParser queryParser = new AnalyzingQueryParser("content",analyzer); >> >> String input = "luciene OR doqument"; >> >> Query query = queryParser.parse(input); >> >> String input2 = query.toString("content"); >> >> String[] splitString = input2.split(" "); >> >> >> For each of the string in the array i performed the suggestSimilar(..). >> >> >> Is this the most appropriate way of doing this? >> >> >> >> Any help would be appreciated. >> >> >> Cheers >> >> Amin >> >> > --0016e6dee7444c9dd90467927912--