lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Murarka <>
Subject Re: How to Index each file and then each Line for Complete Phrase Match. Sample Data shown.
Date Mon, 05 Aug 2013 08:49:44 GMT

1. What I am trying to implement is "Complete Suggestion Match-Did You 
Mean feature for a phrase. I did it for Single Word. I want to do it now 
for Sentence."

2. What my understanding of indexing each line as a valid phrase in a 
particular file is as follows:

a. Instead of providing a directory name to index, give file name.
b. Following code to read each line..  This might be wrong as I am not 
fully aware of how to index each log line as a valid phrase and not the 
individual words.

      LineNumberReader lnr = new LineNumberReader( new FileReader( new 
          String line=null;
           while( null != (line = lnr.readLine()) ){
               doc.add(new TextField("contents",line,Field.Store.YES));

c. Using StandardAnalyzer and storing the index in a separate location.

Now, Obviously after this I ran into problem. I provided this index to 
SpellCheck to create its own index using this and then invoked 
SpellCheck similar method to give me suggestions. I got only 1 word as 
the suggested.

Now I know I have done a terrible mistake over here but don't seem to 
figure out.

I guess I need to index the whole line as a Phrase (present in the file) 
as a spellchecker suggestion. Wondering what can be the possible 
approach. Any help will be highly appreciated.

On 8/3/2013 7:25 PM, Jack Krupansky wrote:
> Why not start with something simple? Like, index each log line as a 
> tokenized text field and then do PhraseQuery against that text field? 
> Is there something else you need beyond that?
> -- Jack Krupansky
> -----Original Message----- From: Ankit Murarka
> Sent: Saturday, August 03, 2013 3:22 AM
> To:
> Subject: How to Index each file and then each Line for Complete Phrase 
> Match. Sample Data shown.
> Hello All,
> I have this mentioned in the log file. Till now I am indexing the
> complete directory containing files which contain data like this:
> Now I need to index each line of the file to implement complete phrase
> search. I intend to store phrases in index and then use SpellChecker API
> to suggest me similar phrases.
> 7/20/2013 7:45 *package execution happening-1
> * FATAL *check request has been sent for instance* Ip:Port
> 7/20/2013 7:45 *This is not working perfectly
> * DEBUG *check request for instance being received is status=200
> * Ip:Port *EXCEPTION*
> 7/20/2013 7:45 *Encountering a constant error.
> * DEBUG *response is not proper.Expecting some more information on
> this detail.
> * Ip:Port *EXCEPTION*
> 7/20/2013 7:45 *This needs urgent attention
> * FATAL *I am still trying to ensure it is running perfectly.
> Encountering some issues.
> * Ip:Port *EXCEPTION*
> 7/20/2013 8:01 *Job is running fine.*
> *************************************************************************\ 
> *Exception Occured in ClassFactory* * Function()
> java.nullPointerException: Value is null
> * *Should not be null*
> To implement complete phrase search I reckon I need to index each line 
> and store the phrase .*Phrases in the above mentioned table are 
> highlighted in Bold.*
> So, if I am able to index these and store these phrases as indexes, so 
> when User tries to search for "package executing",
> the Lucene would be able to provide me "package execution happening-1" 
> as a valid suggestion..
> These columns does not have a name to them and hence I cannot index 
> based on column name. Also as shown in the table above, first column 
> may contain time/date or a phrase in itself (shown in last row).
> Please suggest. How is it possible using Lucene and its API. Javadoc 
> does not seem to guide me anywhere for this case.


Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what lies within

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message