lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Gomaa <mostafa.goma...@gmail.com>
Subject Re: Problem with NGram
Date Wed, 01 Apr 2015 16:32:05 GMT
Fuzzy search works with single terms as far as I know. Solr doesn't support
fuzzy querying for phrases out of the box as far as my limited knowledge
goes. You may want to look into using the ComplexPhraseQueryParser plugin.

On Wed, Apr 1, 2015 at 5:07 PM, Mirko Mancin <mirko.mancin@t-frutta.it>
wrote:

>   Doesn’t work with two word! :-(
>
>  If I search "jakart*d* apache lucene”~10 not found  "jakarta apache
> lucene”
>
>  But
>
>  If I search "jakart*e* apache lucene”~10 FOUND  "jakarta apache lucene”
>
>  WHY?!?!?!
>
>   Mirko Mancin
>
>  Software Developer
>
>
> *Ubiq** srl*
>  stradello Conrad Marca-Relli, 9
> 43122 Parma (PR)
> t. +39 0521 781601
> cell. +39 346 4137577
> follow us on Linkedin <https://www.linkedin.com/company/ubiq-srl>
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system. If you are not the intended
> recipient you are notified that disclosing, copying, distributing or taking
> any action in reliance on the contents of this information is strictly
> prohibited.
>
>   Da: Mostafa Gomaa <mostafa.gomaa89@gmail.com>
> Risposta: "dev@lucene.apache.org" <dev@lucene.apache.org>
> Data: mercoledì 1 aprile 2015 15:54
> A: "dev@lucene.apache.org" <dev@lucene.apache.org>
> Oggetto: Re: Problem with NGram
>
>   Hello Mirko,
>
>  Try using fuzzy queries. You can do that by adding a tilde at the end of
> the term you're searching for, like PRIN3ER~. It uses the edit distance
> algorithm to find similar words. You can also specify the number of edits
> by adding the number after the tilde, for example, PRIN3ER~2 will match
> similar words up to two edits. Hope this helps.
>
>  Regards,
>
>  Mostafa Gomaa.
>
> On Wed, Apr 1, 2015 at 2:37 PM, Mirko Mancin <mirko.mancin@t-frutta.it>
> wrote:
>
>>   Hi,
>>
>>      I have a problem with n-gram. I would try to find the word
>> “PRINTER”.
>>
>>  I have this fields:
>>
>>  <field name="bestExternalDescriptionStandard" type="text_general"
>> indexed="true" stored="true" multiValued="true" termVectors="true"
>> termPositions="true" termOffsets="true"/>
>>
>>    <field name="bestExternalDescriptionGram" type="text_ngram" indexed=
>> "true" stored="true" multiValued="true" termVectors="true" termPositions=
>> "true" termOffsets="true"/>
>>
>>
>>
>>
>>  <fieldType name="text_general" class="solr.TextField"
>> positionIncrementGap="100">
>>
>>       <analyzer>
>>
>>         <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>
>>         <filter class="solr.SnowballPorterFilterFactory" language="Italian"
>> />
>>
>>       </analyzer>
>>
>> </fieldType>
>>
>>
>>  <fieldType name="text_ngram" class="solr.TextField" positionIncrementGap
>> ="100">
>>
>> <analyzer>
>>
>>           <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
>> maxGramSize="4"/>
>>
>>
>>            <filter class="solr.LowerCaseFilterFactory"/>
>>
>>           <filter class="solr.SnowballPorterFilterFactory" language="Italian"
>> />
>>
>>         </analyzer>
>>
>> </fieldType>
>>
>>
>>
>>  And rightly found:
>>
>>  “BROTHER PRINTER”,”SAMSUNG PRINTER”,ecc…
>>
>>  But if I search “PRIN3R” (with an error within the string), solr do not
>> return anything!!
>>
>>  How to do it? How to setup my schema.xml for found documents with a
>> certain similarity?
>>
>>  Thanks
>>
>>
>>  Mirko Mancin
>>
>>  Software Developer
>>
>>
>> *Ubiq** srl*
>>  stradello Conrad Marca-Relli, 9
>> 43122 Parma (PR)
>> t. +39 0521 781601
>> cell. +39 346 4137577
>> follow us on Linkedin <https://www.linkedin.com/company/ubiq-srl>
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you have received this email in error please notify the
>> system manager. This message contains confidential information and is
>> intended only for the individual named. If you are not the named addressee
>> you should not disseminate, distribute or copy this e-mail. Please notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system. If you are not the intended
>> recipient you are notified that disclosing, copying, distributing or taking
>> any action in reliance on the contents of this information is strictly
>> prohibited.
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

Mime
View raw message