lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten L ...@fynskemedier.dk>
Subject Re: Use SOLR like the "MySQL LIKE"
Date Tue, 18 Nov 2008 09:40:24 GMT

Thanks for the quick reply!

It is supposed to work a little like the Google Suggest or field
autocompletion.

I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.

I looked at the solr.WordDelimiterFilterFactory, but it does not mention
anything about whitespaces - or wildcards.

A quick brushup:
I would like to mimic the LIKE functionality from MySQL using the wildcards
in the end of the searchquery.
In MySQL whitespaces are treated as characters, not "splitters".


Aleksander M. Stensby wrote:
> 
> Hi there,
> 
> You should use LowerCaseTokenizerFactory as you point out yourself. As far  
> as I know, the StandardTokenizer "recognizes email addresses and internet  
> hostnames as one token". In your case, I guess you want an email, say  
> "average.joe@apache.org" to be split into four tokens: average joe apache  
> org, or something like that, which would indeed allow you to search for  
> "joe" or "average j*" and match. To do so, you could use the  
> WordDelimiterFilterFactory and split on intra-word delimiters (I think the  
> defaults here are non-alphanumeric chars).
> 
> Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
> for more info on tokenizers and filters.
> 
> cheers,
>   Aleks
> 
> On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <cl@fynskemedier.dk> wrote:
> 
>>
>> Hello.
>>
>> The data:
>> I have a dataset containing ~500.000 documents.
>> In each document there is an email, a name and an user ID.
>>
>> The problem:
>> I would like to be able to search in it, but it should be like the "MySQL
>> LIKE".
>>
>> So when a user enters the search term: "carsten", then the query looks  
>> like:
>>         "name:(carsten) OR name:(carsten*) OR email:(carsten) OR
>> email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"
>>
>> Then it should match:
>> carsten l
>> carsten larsen
>> Carsten Larsen
>> Carsten
>> CARSTEN
>> etc.
>>
>> And when the user enters the term: "carsten l" the query looks like:
>>         "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
>> email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"
>>
>> Then it should match:
>> carsten l
>> carsten larsen
>> Carsten Larsen
>>
>> Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
>> `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."
>>
>> I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name
>> and email field, to ensure case insentitive behavior.
>> The problem seems to be the wildcards and the whitespaces.
> 
> 
> 
> -- 
> Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20556271.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message