lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Strategies for effective prefix queries?
Date Thu, 17 Jul 2014 00:34:31 GMT
Your first and last email seem to be contradicting. You said initially
you wanted to search for "solr-u" and match that. Now you are saying
you want to search "bo sm" and match that.

Either way, I do have very similar scenario working in the project I
sent you a link to. I am breaking on full-stops and case changes for
Javadoc names. You can try it live for yourself here:
http://www.solr-start.com/javadoc/solr-lucene/index.html (Search for
"To Fi" to match for TokenFilter).

Regards,
    Alex
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Thu, Jul 17, 2014 at 1:00 AM, Hayden Muhl <haydenmuhl@gmail.com> wrote:
> A copy field does not address my problem, and this has nothing to do with
> stored fields. This is a query parsing problem, not an indexing problem.
>
> Here's the use case.
>
> If someone has a username like "bob-smith", I would like it to match
> prefixes of "bo" and "sm". I tokenize the username into the tokens "bob"
> and "smith". Everything is fine so far.
>
> If someone enters "bo sm" as a search string, I would like "bob-smith" to
> be one of the results. The query to do this is straight forward,
> "username:bo* username:sm*". Here's the problem. In order to construct that
> query, I have to tokenize the search string "bo sm" **on the client**. I
> don't want to reimplement tokenization on the client. Is there any way to
> give Solr the string "bo sm", have Solr do the tokenization, then treat
> each token like a prefix?
>
>
> On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> So copyField it to another and apply alternative processing there. Use
>> eDismax to search both. No need to store the copied field, just index it.
>>
>> Regards,
>>      Alex
>> On 16/07/2014 2:46 am, "Hayden Muhl" <haydenmuhl@gmail.com> wrote:
>>
>> > Both fields? There is only one field here: username.
>> >
>> >
>> > On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch <
>> arafalov@gmail.com
>> > >
>> > wrote:
>> >
>> > > Search against both fields (one split, one not split)? Keep original
>> > > and tokenized form? I am doing something similar with class name
>> > > autocompletes here:
>> > >
>> > >
>> >
>> https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
>> > >
>> > > Regards,
>> > >    Alex.
>> > > Personal: http://www.outerthoughts.com/ and @arafalov
>> > > Solr resources: http://www.solr-start.com/ and @solrstart
>> > > Solr popularizers community:
>> https://www.linkedin.com/groups?gid=6713853
>> > >
>> > >
>> > > On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl <haydenmuhl@gmail.com>
>> > wrote:
>> > > > I'm working on using Solr for autocompleting usernames. I'm running
>> > into
>> > > a
>> > > > problem with the wildcard queries (e.g. username:al*).
>> > > >
>> > > > We are tokenizing usernames so that a username like "solr-user" will
>> be
>> > > > tokenized into "solr" and "user", and will match both "sol" and "use"
>> > > > prefixes. The problem is when we get "solr-u" as a prefix, I'm having
>> > to
>> > > > split that up on the client side before I construct a query
>> > > "username:solr*
>> > > > username:u*". I'm basically using a regex as a poor man's tokenizer.
>> > > >
>> > > > Is there a better way to approach this? Is there a way to tell Solr
>> to
>> > > > tokenize a string and use the parts as prefixes?
>> > > >
>> > > > - Hayden
>> > >
>> >
>>

Mime
View raw message