lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niclas Rothman <>
Subject RE: Wildcard searches????
Date Fri, 05 Feb 2010 20:03:48 GMT
Hi again Ted and many thanks for your efforts.
Ok, just to be sure that we fully understand each other:

In my index I will store partial useragents without any wildcards *, e.g.

Fire 	(for Firefox)
Inte	(Internet Explorer)
Moz	(Mozill)

When I during runtime search my index for Media objects that are compatible with a useragent,



Hopefully lucene / solr will serve me with all Media objects that partially math my full user
agent string and also perhaps some mismatches. To be absolutely sure that I only show Media
objects that are compatible, I will have to loop through the resultset in my program to do
a final test and exclude any mismatches.

Is this what you are saying Ted, that I cant do the whole process in Solr / Lucene, that I
need to do the final test in my program (C#)?

Also, Im using Solr 1.4, what fieldtype would you recommend to use for the useragent ( tokenized)

Okey, lets see what you have to say about this.
Please bear with me, im all new to lucene and solr!!


-----Original Message-----
From: Ted Dunning [] 
Sent: 05 February 2010 20:43
Subject: Re: Wildcard searches????

Yes.  I think you have it.

To explain in a bit more detail, I think that you should store a tokenized
form of the user agents and should query using a tokenized form of your user
agent.  This will retrieve documents that have partial matches to the user
agent of interest.  Many of these matches, however, may not meet the
requirements of the wildcard expression in the documents.  As such, you will
need to look at each retrieved document to retrieve the wild expression from
each one in turn to test if the original (untokenized) query satisfies the

If your wildcards are all of a positive nature as your example is, then this
should work pretty well.

On Fri, Feb 5, 2010 at 9:09 AM, Niclas Rothman <> wrote:

> Hi Ted and thanks for all your efforts.
> Listen im a little bit lost here trying to understand what you are trying
> to tell me :-)
> 1. I Store my useragents in a field that is tokenized.
> 2. Then when I search, you are saying that I should "scan" down the matches
> via a SOLR function, or what?
> Are you referring to these functions in SOLR?
> Sorry for not grasping immmediatley!
> Regards Niclas
> -----Original Message-----
> From: Ted Dunning []
> Sent: 05 February 2010 17:44
> To:
> Cc:
> Subject: Re: Wildcard searches????
> Tokenize your user agent strings, then store the tokenized form separately
> from the wild card.  At retrieval time, scan down the matches and apply the
> wildcard from each document to your original query.  The SOLR function
> query
> might be useful for this as would be a custom hit collector.
> On Fri, Feb 5, 2010 at 7:57 AM, Niclas Rothman <> wrote:
> > Hi there, i facing a problem and would like to ask the community for some
> > help.
> >
> > In my index I store browser  useragent values as "wildcarded" / partial,
> >  which should be understood that an indexed document
> > should only be shown to end users if his browsers useragent matches a
> > wildcared usereragent in my document.
> >
> > So what I have Is actually a "reversed" matching, the wildcards are in my
> > document and NOT in my actual query.
> > Does anyone know if this "setup" Is possible, e.g. to execute a query in
> > style with:
> >
> > useragents:
> >
> "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/"
> >
> > In this example I would have a hit because Mozilla/4.0* matches the
> > useragent.
> >
> > <doc>
> > <useragents>
> >                Firefox*
> >                Mozilla/4.0*
> > </useragents>
> > </doc>
> >
> >
> > Regards
> > Niclas
> >
> --
> Ted Dunning, CTO
> DeepDyve

Ted Dunning, CTO
View raw message