Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 589281015D for ; Tue, 6 Aug 2013 13:03:34 +0000 (UTC) Received: (qmail 32815 invoked by uid 500); 6 Aug 2013 13:03:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 32709 invoked by uid 500); 6 Aug 2013 13:03:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 32628 invoked by uid 99); 6 Aug 2013 13:03:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 13:03:31 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.212.50] (HELO mail-vb0-f50.google.com) (209.85.212.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 13:03:27 +0000 Received: by mail-vb0-f50.google.com with SMTP id x14so329182vbb.9 for ; Tue, 06 Aug 2013 06:02:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=rlgEu6NLOZC2HRAJCLSV3JrqD3GbORQwVYqfpgczsRQ=; b=krB5WfWRaatZPE+NjUeFWsjUQ6iPqmXrv5dNDzNmdL+X+t0EYU5JdP/FohNS2oKei/ auHJAWOf3Rw1VPIhtZM354ezg6wuTxDotTRaqqlkHPHs2/3OpA7EoOkckyj7SAulfyQG CF3kteGhWSdkcw3x60wq+XvPlM4bxMdSFP40Tp8fRhH+y7PG4f8FkRsf2HiNroPM0abw ID++ppdlYqsnVvWVK0QFOX+KvC/skQS8x+dLNpf5KtTPZ883p44bd19KYrLBkBEMSHhX zVkLOzLHz6/sA86IMsOpbVydcTVvAcDQLCdoQoEB5gzVZ/OLAuKSjVGMpuEE6rr76AJB 5+tg== X-Gm-Message-State: ALoCoQmm8Q9ojO9S1DU9PXZPizhMnlDN6BVY2PdUVTr5H3f3GUG6z3MCStC48mkozrV4XJjvCwRw X-Received: by 10.220.75.73 with SMTP id x9mr338976vcj.38.1375794166076; Tue, 06 Aug 2013 06:02:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.20.130 with HTTP; Tue, 6 Aug 2013 06:02:25 -0700 (PDT) In-Reply-To: <5200A82C.9010305@rancoretech.com> References: <51FCAFAC.5060406@rancoretech.com> <51FF6728.3070207@rancoretech.com> <5200A82C.9010305@rancoretech.com> From: Michael McCandless Date: Tue, 6 Aug 2013 09:02:25 -0400 Message-ID: Subject: Re: How to Index each file and then each Line for Complete Phrase Match. Sample Data shown. To: Lucene Users Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org The suggester builds its own index when you call the build() method; you need to provide a TermFreqIterator that iterates over all your suggestions. Each suggester has different tradeoffs, e.g. the FST based suggesters are prefix-only matching, while AnalyzingInfixSuggester will suggest based on non-prefix matches. You can see AnalyzingInfixSuggester running at http://jirasearch.mikemccandless.com e.g. try typing fst. If you want spell-checker like behavior, use FuzzySuggester, which allows up to 2 "edits" when finding a matching suggestion. Once the suggester is built, use the lookup method to find suggestions ... Mike McCandless http://blog.mikemccandless.com On Tue, Aug 6, 2013 at 3:39 AM, Ankit Murarka wrote: > Hello. > > I dont seem to figure out what to use. Started with AnalyzingSuggester and > passed StandardAnalyzer to its constructor. > > But essentially in order to get the suggestions, I will have to index the > already indexed document. Now how do I index it again using this > AnalyzingSuggester. > > I cannot use SpellChecker with this as this seem to accept only Analyzer and > not AnalyzerSuggester. > > Is there a different way of using this AnalyzingSuggester to get the search > suggestion.. > > Also, verified from the Luke, that indexing the document with > LineNumberReader is actually working properly. Each line is being separately > indexed. > > Now how do I go about implementing this phrase did you mean search ??? > > > On 8/5/2013 5:08 PM, Michael McCandless wrote: >> >> Why not use one of the suggesters under lucene/suggest/*? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Mon, Aug 5, 2013 at 4:49 AM, Ankit Murarka >> wrote: >> >>> >>> Hello. >>> >>> 1. What I am trying to implement is "Complete Suggestion Match-Did You >>> Mean >>> feature for a phrase. I did it for Single Word. I want to do it now for >>> Sentence." >>> >>> 2. What my understanding of indexing each line as a valid phrase in a >>> particular file is as follows: >>> >>> a. Instead of providing a directory name to index, give file name. >>> b. Following code to read each line.. This might be wrong as I am not >>> fully >>> aware of how to index each log line as a valid phrase and not the >>> individual >>> words. >>> >>> >>> LineNumberReader lnr = new LineNumberReader( new FileReader( new >>> >>> File("D:\\Lucene\\FileSearch\\Memo-1094.20130722-005200_10761334-10771333.txt"))) >>> ; >>> String line=null; >>> while( null != (line = lnr.readLine()) ){ >>> doc.add(new TextField("contents",line,Field.Store.YES)); >>> } >>> >>> c. Using StandardAnalyzer and storing the index in a separate location. >>> >>> Now, Obviously after this I ran into problem. I provided this index to >>> SpellCheck to create its own index using this and then invoked SpellCheck >>> similar method to give me suggestions. I got only 1 word as the >>> suggested. >>> >>> Now I know I have done a terrible mistake over here but don't seem to >>> figure >>> out. >>> >>> I guess I need to index the whole line as a Phrase (present in the file) >>> as >>> a spellchecker suggestion. Wondering what can be the possible approach. >>> Any >>> help will be highly appreciated. >>> >>> >>> On 8/3/2013 7:25 PM, Jack Krupansky wrote: >>> >>>> >>>> Why not start with something simple? Like, index each log line as a >>>> tokenized text field and then do PhraseQuery against that text field? Is >>>> there something else you need beyond that? >>>> >>>> -- Jack Krupansky >>>> >>>> -----Original Message----- From: Ankit Murarka >>>> Sent: Saturday, August 03, 2013 3:22 AM >>>> To: java-user@lucene.apache.org >>>> Subject: How to Index each file and then each Line for Complete Phrase >>>> Match. Sample Data shown. >>>> >>>> Hello All, >>>> >>>> I have this mentioned in the log file. Till now I am indexing the >>>> complete directory containing files which contain data like this: >>>> >>>> Now I need to index each line of the file to implement complete phrase >>>> search. I intend to store phrases in index and then use SpellChecker API >>>> to suggest me similar phrases. >>>> >>>> 7/20/2013 7:45 *package execution happening-1 >>>> * FATAL *check request has been sent for instance* Ip:Port >>>> *EXCEPTION* >>>> 7/20/2013 7:45 *This is not working perfectly >>>> * DEBUG *check request for instance being received is status=200 >>>> * Ip:Port *EXCEPTION* >>>> 7/20/2013 7:45 *Encountering a constant error. >>>> * DEBUG *response is not proper.Expecting some more information on >>>> this detail. >>>> * Ip:Port *EXCEPTION* >>>> 7/20/2013 7:45 *This needs urgent attention >>>> * FATAL *I am still trying to ensure it is running perfectly. >>>> Encountering some issues. >>>> * Ip:Port *EXCEPTION* >>>> >>>> 7/20/2013 8:01 *Job is running fine.* >>>> INFO >>>> >>>> *************************************************************************\ >>>> >>>> *Exception Occured in ClassFactory* * Function() >>>> java.nullPointerException: Value is null >>>> * *Should not be null* >>>> >>>> To implement complete phrase search I reckon I need to index each line >>>> and >>>> store the phrase .*Phrases in the above mentioned table are highlighted >>>> in >>>> Bold.* >>>> >>>> So, if I am able to index these and store these phrases as indexes, so >>>> when User tries to search for "package executing", >>>> >>>> the Lucene would be able to provide me "package execution happening-1" >>>> as >>>> a valid suggestion.. >>>> >>>> These columns does not have a name to them and hence I cannot index >>>> based >>>> on column name. Also as shown in the table above, first column may >>>> contain >>>> time/date or a phrase in itself (shown in last row). >>>> >>>> Please suggest. How is it possible using Lucene and its API. Javadoc >>>> does >>>> not seem to guide me anywhere for this case. >>>> >>>> >>> >>> >>> -- >>> Regards >>> >>> Ankit Murarka >>> >>> "What lies behind us and what lies before us are tiny matters compared >>> with >>> what lies within us" >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> > > > > -- > Regards > > Ankit Murarka > > "What lies behind us and what lies before us are tiny matters compared with > what lies within us" > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org