lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emanuel Buzek <emanuel.bu...@roke.cz>
Subject Re: How to locate a Phrase inside text (like a Browser text searcher)
Date Tue, 13 May 2014 14:00:28 GMT
I was trying to solve pretty much the same thing few weeks back and I ended
up using the NGram tokenizer. Although it made my index much larger (the
index grew 15x), the fulltext queries are pretty fast and I don't have to
use wildcards in queries.
http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html

cheers, Ema


2014-05-13 2:39 GMT+02:00 Michael Sokolov <msokolov@safaribooksonline.com>:

> ShingleFilter can help with this; it concatenates neighboring tokens.  So
> a search for "good morning john" becomes a search for
>
> "goodmorning john" OR
> "good morningjohn" OR
> "good morning john"
>
> it makes your index much bigger because of all the terms, but you may find
> it's worth the cost
>
> -Mike
>
>
> On 5/11/2014 9:46 PM, Jack Krupansky wrote:
>
>> The word delimiter filter can help for "MorningJohn" by setting its
>> option to split on case change.
>>
>> You might be able to handle "Mailhow" using the
>> DictionaryCompoundWordTokenFilter, but that requires that you create a
>> complete dictionary of terms that can split off. That's not very practical.
>> In truth, Lucene/Solr doesn't have a good out of the box solution for this
>> use case.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: teko
>> Sent: Thursday, May 8, 2014 9:03 AM
>> To: java-user@lucene.apache.org
>> Subject: How to locate a Phrase inside text (like a Browser text searcher)
>>
>> Hi, someone can help me with it??
>> I need do a search to locate a phrase inside text, but, I need locate this
>> phrase on texts like that:
>> 'John Mail' <- phrase I want locate
>> ' Good Morning John Mail how are you? ' < I need find this phrase here
>> ' Good MorningJohn Mail how are you? ' < here too
>> ' GoodMorning John Mailhow are you? ' < and here
>>
>> I tried using with 'WhiteSpaceAnalyzer' and 'QueryParser'... but not work
>> (locate just in the first sample above... but not the others)
>>
>> Please, I really need help with it!
>> Thanks (note: Sorry my english!! xD)
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/How-to-locate-a-Phrase-inside-text-like-a-
>> Browser-text-searcher-tp4135075.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Emanuel Buzek
Software Engineer, ROKE.cz <http://www.roke.cz>
tel: +420 776 54 26 26

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message