lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Murarka <ankit.mura...@rancoretech.com>
Subject Re: Search a Part of the Sentence/Complete sentence in lucene 4.3
Date Fri, 26 Jul 2013 11:34:57 GMT
Hello can you elaborate more on this.. I seem to be lost over here..

Since I am new to lucene, so yesterday I was going through ShingleFilter 
and its application. Seems like its a kind of a N-Gram thing and it 
bloats the index as Mike have mentioned.

As of now I am only concerned with the appropiate way to solve this problem.

With PhraseQuery if I specify terms, then do you also want me to specify 
slop ? If I dont supply slop it default to specific search match. 
However due to stopwords this phraseQuery was not giving me any hits and 
hence I raised this question.

I still dont know from where to approach this problem and how to solve this.

I am sure this is definitely supported by Lucene but Perhaps a bit more 
explanation and guidance will do the trick for me.

On 7/24/2013 6:06 PM, Michael McCandless wrote:
> With PhraseQuery you can specify where each term must occur in the phrase.
>
> So X must occur in position 0, David in position 1, and then manager
> in position 4 (skipping 2 holes).
>
> QueryParser does this for you: when it analyzes the users phrase, if
> the resulting tokens have holes, then it sets the positions
> accordingly.
>
> And I agree: shingles are a good solution here too, but they make your
> index larger.  CommonGramsFilter lets you shingle only specific words,
> e.g. you could pass your stop words to it.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Jul 24, 2013 at 7:34 AM, Ankit Murarka
> <ankit.murarka@rancoretech.com>  wrote:
>    
>> I tried using Phrase Query with slops. Now since I am specifying the slop I
>> also need to specify the 2nd term.
>>
>> In my case the 2nd term is not present. The whole string to be searched is
>> still 1 single term.
>>
>> How do I skip the holes created by stopwords. I do not know before hand how
>> many stop words are skipped and what string user is going to enter.
>>
>> Is there a definite way to skip the holes created by stopwords.
>>
>> I was now looking for MultiphraseQuery splitting the user provided string on
>> space and providing each word as a term to multiphrasequery.
>>
>> Will it help..?? Is there any alternative. ??
>>
>>
>> On 7/24/2013 4:48 PM, Michael McCandless wrote:
>>      
>>> PhraseQuery?
>>>
>>> You can skip the holes created by stopwords ... e.g. QueryParser does
>>> this.  Ie, the PhraseQuery becomes "X David _ _ manager _ _ company"
>>> if is/a/of/the are stop words, which isn't perfect (could return false
>>> matches) but should work well in practice ...
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Jul 24, 2013 at 4:31 AM, Ankit Murarka
>>> <ankit.murarka@rancoretech.com>   wrote:
>>>
>>>        
>>>> Dear All,
>>>>
>>>> Say suppose I have 3 documents. The sample text is
>>>>
>>>> /*File 1 : */
>>>>
>>>> Mr X David is a manager of the company. He is the senior most manager. I
>>>> also want to become manager of the company.
>>>>
>>>> /*File 2 :*/
>>>>
>>>> Mr X David manager of the company is also very senior. He happens to be
>>>> the
>>>> senior most manager. I wish even I could reach that place.
>>>>
>>>> /*File 3:*/
>>>>
>>>> Mr X David is working for a company. He happens to be the manager of the
>>>> company.Infact he is the senior most manager. I dont want to become like
>>>> him.
>>>>
>>>> /*String I wish to search :* X David is a manager of the company./
>>>>
>>>> Ideally I should get only file1 in the hit result.
>>>>
>>>> I have no clue how to achieve this. Basically I am trying to match the
>>>> part
>>>> of the sentence or a complete sentence. What can be the best methodology.
>>>> I presume is a are the stop words and will be skipped during indexing by
>>>> the
>>>> StandardAnalyzer.
>>>>
>>>> What wonders me how do I then search for a part of the sentence or
>>>> complete
>>>> sentence if sentence contains some/many stopwords.
>>>>
>>>> I am using StandardAnalyzer. Please guide.
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit
>>>>
>>>>
>>>>          
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>        
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "Peace is found not in what surrounds us, but in what we hold within."
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>      
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>    


-- 
Regards

Ankit Murarka

"Peace is found not in what surrounds us, but in what we hold within."


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message