lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Search a Part of the Sentence/Complete sentence in lucene 4.3
Date Wed, 24 Jul 2013 12:36:20 GMT
With PhraseQuery you can specify where each term must occur in the phrase.

So X must occur in position 0, David in position 1, and then manager
in position 4 (skipping 2 holes).

QueryParser does this for you: when it analyzes the users phrase, if
the resulting tokens have holes, then it sets the positions
accordingly.

And I agree: shingles are a good solution here too, but they make your
index larger.  CommonGramsFilter lets you shingle only specific words,
e.g. you could pass your stop words to it.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jul 24, 2013 at 7:34 AM, Ankit Murarka
<ankit.murarka@rancoretech.com> wrote:
> I tried using Phrase Query with slops. Now since I am specifying the slop I
> also need to specify the 2nd term.
>
> In my case the 2nd term is not present. The whole string to be searched is
> still 1 single term.
>
> How do I skip the holes created by stopwords. I do not know before hand how
> many stop words are skipped and what string user is going to enter.
>
> Is there a definite way to skip the holes created by stopwords.
>
> I was now looking for MultiphraseQuery splitting the user provided string on
> space and providing each word as a term to multiphrasequery.
>
> Will it help..?? Is there any alternative. ??
>
>
> On 7/24/2013 4:48 PM, Michael McCandless wrote:
>>
>> PhraseQuery?
>>
>> You can skip the holes created by stopwords ... e.g. QueryParser does
>> this.  Ie, the PhraseQuery becomes "X David _ _ manager _ _ company"
>> if is/a/of/the are stop words, which isn't perfect (could return false
>> matches) but should work well in practice ...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Jul 24, 2013 at 4:31 AM, Ankit Murarka
>> <ankit.murarka@rancoretech.com>  wrote:
>>
>>>
>>> Dear All,
>>>
>>> Say suppose I have 3 documents. The sample text is
>>>
>>> /*File 1 : */
>>>
>>> Mr X David is a manager of the company. He is the senior most manager. I
>>> also want to become manager of the company.
>>>
>>> /*File 2 :*/
>>>
>>> Mr X David manager of the company is also very senior. He happens to be
>>> the
>>> senior most manager. I wish even I could reach that place.
>>>
>>> /*File 3:*/
>>>
>>> Mr X David is working for a company. He happens to be the manager of the
>>> company.Infact he is the senior most manager. I dont want to become like
>>> him.
>>>
>>> /*String I wish to search :* X David is a manager of the company./
>>>
>>> Ideally I should get only file1 in the hit result.
>>>
>>> I have no clue how to achieve this. Basically I am trying to match the
>>> part
>>> of the sentence or a complete sentence. What can be the best methodology.
>>> I presume is a are the stop words and will be skipped during indexing by
>>> the
>>> StandardAnalyzer.
>>>
>>> What wonders me how do I then search for a part of the sentence or
>>> complete
>>> sentence if sentence contains some/many stopwords.
>>>
>>> I am using StandardAnalyzer. Please guide.
>>>
>>> --
>>> Regards
>>>
>>> Ankit
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "Peace is found not in what surrounds us, but in what we hold within."
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message