Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <5200A82C.9010305@rancoretech.com>
References: <51FCAFAC.5060406@rancoretech.com>
 <E14373CE094D4CBAA232A76546FC153F@JackKrupansky>
 <51FF6728.3070207@rancoretech.com>
 <CAL8PwkaHeTLjGVxvpRJG1s7opOZnysHPcaWoMEYo-3aa_WsRBA@mail.gmail.com>
 <5200A82C.9010305@rancoretech.com>
From: Michael McCandless <lucene@mikemccandless.com>
Date: Tue, 6 Aug 2013 09:02:25 -0400
Message-ID: 
 <CAL8PwkbfftVUR5_wLsNtpLa_6yW4r3X3P7W+v4qHF75mJjKkoQ@mail.gmail.com>
Subject: Re: How to Index each file and then each Line for Complete Phrase
 Match. Sample Data shown.
To: Lucene Users <java-user@lucene.apache.org>
Content-Type: text/plain; charset=ISO-8859-1

The suggester builds its own index when you call the build() method;
you need to provide a TermFreqIterator that iterates over all your
suggestions.

Each suggester has different tradeoffs, e.g. the FST based suggesters
are prefix-only matching, while AnalyzingInfixSuggester will suggest
based on non-prefix matches.  You can see AnalyzingInfixSuggester
running at http://jirasearch.mikemccandless.com e.g. try typing fst.

If you want spell-checker like behavior, use FuzzySuggester, which
allows up to 2 "edits" when finding a matching suggestion.

Once the suggester is built, use the lookup method to find suggestions ...

Mike McCandless

http://blog.mikemccandless.com


On Tue, Aug 6, 2013 at 3:39 AM, Ankit Murarka
<ankit.murarka@rancoretech.com> wrote:
> Hello.
>
> I dont seem to figure out what to use. Started with AnalyzingSuggester and
> passed StandardAnalyzer to its constructor.
>
> But essentially in order to get the suggestions, I will have to index the
> already indexed document. Now how do I index it again using this
> AnalyzingSuggester.
>
> I cannot use SpellChecker with this as this seem to accept only Analyzer and
> not AnalyzerSuggester.
>
> Is there a different way of using this AnalyzingSuggester to get the search
> suggestion..
>
> Also, verified from the Luke, that indexing the document with
> LineNumberReader is actually working properly. Each line is being separately
> indexed.
>
> Now how do I go about implementing this phrase did you mean search ???
>
>
> On 8/5/2013 5:08 PM, Michael McCandless wrote:
>>
>> Why not use one of the suggesters under lucene/suggest/*?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Aug 5, 2013 at 4:49 AM, Ankit Murarka
>> <ankit.murarka@rancoretech.com>  wrote:
>>
>>>
>>> Hello.
>>>
>>> 1. What I am trying to implement is "Complete Suggestion Match-Did You
>>> Mean
>>> feature for a phrase. I did it for Single Word. I want to do it now for
>>> Sentence."
>>>
>>> 2. What my understanding of indexing each line as a valid phrase in a
>>> particular file is as follows:
>>>
>>> a. Instead of providing a directory name to index, give file name.
>>> b. Following code to read each line..  This might be wrong as I am not
>>> fully
>>> aware of how to index each log line as a valid phrase and not the
>>> individual
>>> words.
>>>
>>>
>>>       LineNumberReader lnr = new LineNumberReader( new FileReader( new
>>>
>>> File("D:\\Lucene\\FileSearch\\Memo-1094.20130722-005200_10761334-10771333.txt")))
>>> ;
>>>           String line=null;
>>>            while( null != (line = lnr.readLine()) ){
>>>                doc.add(new TextField("contents",line,Field.Store.YES));
>>>            }
>>>
>>> c. Using StandardAnalyzer and storing the index in a separate location.
>>>
>>> Now, Obviously after this I ran into problem. I provided this index to
>>> SpellCheck to create its own index using this and then invoked SpellCheck
>>> similar method to give me suggestions. I got only 1 word as the
>>> suggested.
>>>
>>> Now I know I have done a terrible mistake over here but don't seem to
>>> figure
>>> out.
>>>
>>> I guess I need to index the whole line as a Phrase (present in the file)
>>> as
>>> a spellchecker suggestion. Wondering what can be the possible approach.
>>> Any
>>> help will be highly appreciated.
>>>
>>>
>>> On 8/3/2013 7:25 PM, Jack Krupansky wrote:
>>>
>>>>
>>>> Why not start with something simple? Like, index each log line as a
>>>> tokenized text field and then do PhraseQuery against that text field? Is
>>>> there something else you need beyond that?
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Ankit Murarka
>>>> Sent: Saturday, August 03, 2013 3:22 AM
>>>> To: java-user@lucene.apache.org
>>>> Subject: How to Index each file and then each Line for Complete Phrase
>>>> Match. Sample Data shown.
>>>>
>>>> Hello All,
>>>>
>>>> I have this mentioned in the log file. Till now I am indexing the
>>>> complete directory containing files which contain data like this:
>>>>
>>>> Now I need to index each line of the file to implement complete phrase
>>>> search. I intend to store phrases in index and then use SpellChecker API
>>>> to suggest me similar phrases.
>>>>
>>>> 7/20/2013 7:45 *package execution happening-1
>>>> * FATAL *check request has been sent for instance* Ip:Port
>>>> *EXCEPTION*
>>>> 7/20/2013 7:45 *This is not working perfectly
>>>> * DEBUG *check request for instance being received is status=200
>>>> * Ip:Port *EXCEPTION*
>>>> 7/20/2013 7:45 *Encountering a constant error.
>>>> * DEBUG *response is not proper.Expecting some more information on
>>>> this detail.
>>>> * Ip:Port *EXCEPTION*
>>>> 7/20/2013 7:45 *This needs urgent attention
>>>> * FATAL *I am still trying to ensure it is running perfectly.
>>>> Encountering some issues.
>>>> * Ip:Port *EXCEPTION*
>>>>
>>>> 7/20/2013 8:01 *Job is running fine.*
>>>> INFO
>>>>
>>>> *************************************************************************\
>>>>
>>>> *Exception Occured in ClassFactory* * Function()
>>>> java.nullPointerException: Value is null
>>>> * *Should not be null*
>>>>
>>>> To implement complete phrase search I reckon I need to index each line
>>>> and
>>>> store the phrase .*Phrases in the above mentioned table are highlighted
>>>> in
>>>> Bold.*
>>>>
>>>> So, if I am able to index these and store these phrases as indexes, so
>>>> when User tries to search for "package executing",
>>>>
>>>> the Lucene would be able to provide me "package execution happening-1"
>>>> as
>>>> a valid suggestion..
>>>>
>>>> These columns does not have a name to them and hence I cannot index
>>>> based
>>>> on column name. Also as shown in the table above, first column may
>>>> contain
>>>> time/date or a phrase in itself (shown in last row).
>>>>
>>>> Please suggest. How is it possible using Lucene and its API. Javadoc
>>>> does
>>>> not seem to guide me anywhere for this case.
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org