lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Wildcard searches????
Date Fri, 05 Feb 2010 20:17:58 GMT
This is quite close.  You will have to break down the user agent that is
your query into the same kinds of pieces as you did for your index.  Lucene
will only do exact matching of terms during searching (wildcard queries are
handled by exploding the term into all possible variants).

Regarding the field type, you will probably have to customize that a fair
bit to make +'s be separators and such.  If you use SOLR to index and query
your data, then it will make sure that your separation into tokens is
compatible unless you are using shortened forms like you mention here.

On Fri, Feb 5, 2010 at 12:03 PM, Niclas Rothman <niro@lechill.com> wrote:

> Hi again Ted and many thanks for your efforts.
> Ok, just to be sure that we fully understand each other:
>
> In my index I will store partial useragents without any wildcards *, e.g.
>
> Fire    (for Firefox)
> Inte    (Internet Explorer)
> Moz     (Mozill)
>
>
> When I during runtime search my index for Media objects that are compatible
> with a useragent,
> e.g:
>
>
>  "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/6.3.1.20.0"
>
> Hopefully lucene / solr will serve me with all Media objects that partially
> math my full user agent string and also perhaps some mismatches. To be
> absolutely sure that I only show Media objects that are compatible, I will
> have to loop through the resultset in my program to do a final test and
> exclude any mismatches.
>
> Is this what you are saying Ted, that I cant do the whole process in Solr /
> Lucene, that I need to do the final test in my program (C#)?
>
> Also, Im using Solr 1.4, what fieldtype would you recommend to use for the
> useragent ( tokenized)
>
> Okey, lets see what you have to say about this.
> Please bear with me, im all new to lucene and solr!!
>
> Regards
> Niclas
>
>
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: 05 February 2010 20:43
> To: general@lucene.apache.org
> Cc: java-user@lucene.apache.org
> Subject: Re: Wildcard searches????
>
> Yes.  I think you have it.
>
> To explain in a bit more detail, I think that you should store a tokenized
> form of the user agents and should query using a tokenized form of your
> user
> agent.  This will retrieve documents that have partial matches to the user
> agent of interest.  Many of these matches, however, may not meet the
> requirements of the wildcard expression in the documents.  As such, you
> will
> need to look at each retrieved document to retrieve the wild expression
> from
> each one in turn to test if the original (untokenized) query satisfies the
> wildcard.
>
> If your wildcards are all of a positive nature as your example is, then
> this
> should work pretty well.
>
> On Fri, Feb 5, 2010 at 9:09 AM, Niclas Rothman <niro@lechill.com> wrote:
>
> > Hi Ted and thanks for all your efforts.
> > Listen im a little bit lost here trying to understand what you are trying
> > to tell me :-)
> >
> > 1. I Store my useragents in a field that is tokenized.
> > 2. Then when I search, you are saying that I should "scan" down the
> matches
> > via a SOLR function, or what?
> > Are you referring to these functions in SOLR?
> >
> > http://wiki.apache.org/solr/FunctionQuery
> >
> >
> > Sorry for not grasping immmediatley!
> >
> > Regards Niclas
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: 05 February 2010 17:44
> > To: general@lucene.apache.org
> > Cc: java-user@lucene.apache.org
> > Subject: Re: Wildcard searches????
> >
> > Tokenize your user agent strings, then store the tokenized form
> separately
> > from the wild card.  At retrieval time, scan down the matches and apply
> the
> > wildcard from each document to your original query.  The SOLR function
> > query
> > might be useful for this as would be a custom hit collector.
> >
> > On Fri, Feb 5, 2010 at 7:57 AM, Niclas Rothman <niro@lechill.com> wrote:
> >
> > > Hi there, i facing a problem and would like to ask the community for
> some
> > > help.
> > >
> > > In my index I store browser  useragent values as "wildcarded" /
> partial,
> > >  which should be understood that an indexed document
> > > should only be shown to end users if his browsers useragent matches a
> > > wildcared usereragent in my document.
> > >
> > > So what I have Is actually a "reversed" matching, the wildcards are in
> my
> > > document and NOT in my actual query.
> > > Does anyone know if this "setup" Is possible, e.g. to execute a query
> in
> > > style with:
> > >
> > > useragents:
> > >
> >
> "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/6.3.1.20.0"
> > >
> > > In this example I would have a hit because Mozilla/4.0* matches the
> > > useragent.
> > >
> > > <doc>
> > > <useragents>
> > >                Firefox*
> > >                Mozilla/4.0*
> > > </useragents>
> > > </doc>
> > >
> > >
> > > Regards
> > > Niclas
> > >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message