lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: Wildcard searches????
Date Fri, 05 Feb 2010 21:48:45 GMT
Niclas,

I looked at your initial post, you are creating document with field "abc*"
- nothing related to "wildcard query"!

Of course, query [useragents:abcdefghijklm] will return no results, and [q=useragents:abc]
no results, but [q=useragents:abc*] will return something.

text_nav is specific SOLR type for _leading_ wildcard queries; you don't need it (you don't
need _leading_ wildcard queries).

On indexing time, instead of
<doc>
<useragents>
                Firefox*
                Mozilla/4.0*
</useragents>
</doc>


You should index
<doc>
<useragents>
	Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP8.4.1+UP.Link/6.3.1.20.0
</useragents>
</doc>

And also, you need to choose properly SOLR type; for instance, textTight or textgen, or even
non-tokenized string!


And, query [q=useragents:moz*] will return this document (even if this field is nontokenized).


-Fuad


P.S. Don't use * when you create Lucene document; use it as part of query.




> -----Original Message-----
> From: Niclas Rothman [mailto:niro@lechill.com]
> Sent: February-05-10 4:44 PM
> To: general@lucene.apache.org
> Cc: java-user@lucene.apache.org
> Subject: RE: Wildcard searches????
> 
> Ted im using SOLR, but I cant figure out what type of fieldtype I should
> use to get a query like this to work:
> 
> 
> q=useragents: abcdefghijklm
> 
> 
> where I have in my index one document with value "abc" in field
> "useragents"
> 
> That query results in 0 hits.
> 
> If I issue this I get 1 hit of course (exact mathch)
> 
> q=useragents: Mozilla
> 
> 
> My document definition in SOLR looks like:
> 
> <fields>
>     <field name="id" type="tint" indexed="true" stored="true"
> required="true" />
>     <field name="useragents" type="text_rev" indexed="true"
> stored="true" required="false" multiValued="true" />
> </fields>
> 
> Any clue?
> 
> Nic
> 
> 
> 
> 
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: 05 February 2010 21:18
> To: general@lucene.apache.org
> Cc: java-user@lucene.apache.org
> Subject: Re: Wildcard searches????
> 
> This is quite close.  You will have to break down the user agent that is
> your query into the same kinds of pieces as you did for your index.
> Lucene
> will only do exact matching of terms during searching (wildcard queries
> are
> handled by exploding the term into all possible variants).
> 
> Regarding the field type, you will probably have to customize that a
> fair
> bit to make +'s be separators and such.  If you use SOLR to index and
> query
> your data, then it will make sure that your separation into tokens is
> compatible unless you are using shortened forms like you mention here.
> 
> On Fri, Feb 5, 2010 at 12:03 PM, Niclas Rothman <niro@lechill.com>
> wrote:
> 
> > Hi again Ted and many thanks for your efforts.
> > Ok, just to be sure that we fully understand each other:
> >
> > In my index I will store partial useragents without any wildcards *,
> e.g.
> >
> > Fire    (for Firefox)
> > Inte    (Internet Explorer)
> > Moz     (Mozill)
> >
> >
> > When I during runtime search my index for Media objects that are
> compatible
> > with a useragent,
> > e.g:
> >
> >
> >
> "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP-
> 2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/6.3.1.20.0"
> >
> > Hopefully lucene / solr will serve me with all Media objects that
> partially
> > math my full user agent string and also perhaps some mismatches. To be
> > absolutely sure that I only show Media objects that are compatible, I
> will
> > have to loop through the resultset in my program to do a final test
> and
> > exclude any mismatches.
> >
> > Is this what you are saying Ted, that I cant do the whole process in
> Solr /
> > Lucene, that I need to do the final test in my program (C#)?
> >
> > Also, Im using Solr 1.4, what fieldtype would you recommend to use for
> the
> > useragent ( tokenized)
> >
> > Okey, lets see what you have to say about this.
> > Please bear with me, im all new to lucene and solr!!
> >
> > Regards
> > Niclas
> >
> >
> >
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: 05 February 2010 20:43
> > To: general@lucene.apache.org
> > Cc: java-user@lucene.apache.org
> > Subject: Re: Wildcard searches????
> >
> > Yes.  I think you have it.
> >
> > To explain in a bit more detail, I think that you should store a
> tokenized
> > form of the user agents and should query using a tokenized form of
> your
> > user
> > agent.  This will retrieve documents that have partial matches to the
> user
> > agent of interest.  Many of these matches, however, may not meet the
> > requirements of the wildcard expression in the documents.  As such,
> you
> > will
> > need to look at each retrieved document to retrieve the wild
> expression
> > from
> > each one in turn to test if the original (untokenized) query satisfies
> the
> > wildcard.
> >
> > If your wildcards are all of a positive nature as your example is,
> then
> > this
> > should work pretty well.
> >
> > On Fri, Feb 5, 2010 at 9:09 AM, Niclas Rothman <niro@lechill.com>
> wrote:
> >
> > > Hi Ted and thanks for all your efforts.
> > > Listen im a little bit lost here trying to understand what you are
> trying
> > > to tell me :-)
> > >
> > > 1. I Store my useragents in a field that is tokenized.
> > > 2. Then when I search, you are saying that I should "scan" down the
> > matches
> > > via a SOLR function, or what?
> > > Are you referring to these functions in SOLR?
> > >
> > > http://wiki.apache.org/solr/FunctionQuery
> > >
> > >
> > > Sorry for not grasping immmediatley!
> > >
> > > Regards Niclas
> > >
> > > -----Original Message-----
> > > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > > Sent: 05 February 2010 17:44
> > > To: general@lucene.apache.org
> > > Cc: java-user@lucene.apache.org
> > > Subject: Re: Wildcard searches????
> > >
> > > Tokenize your user agent strings, then store the tokenized form
> > separately
> > > from the wild card.  At retrieval time, scan down the matches and
> apply
> > the
> > > wildcard from each document to your original query.  The SOLR
> function
> > > query
> > > might be useful for this as would be a custom hit collector.
> > >
> > > On Fri, Feb 5, 2010 at 7:57 AM, Niclas Rothman <niro@lechill.com>
> wrote:
> > >
> > > > Hi there, i facing a problem and would like to ask the community
> for
> > some
> > > > help.
> > > >
> > > > In my index I store browser  useragent values as "wildcarded" /
> > partial,
> > > >  which should be understood that an indexed document
> > > > should only be shown to end users if his browsers useragent
> matches a
> > > > wildcared usereragent in my document.
> > > >
> > > > So what I have Is actually a "reversed" matching, the wildcards
> are in
> > my
> > > > document and NOT in my actual query.
> > > > Does anyone know if this "setup" Is possible, e.g. to execute a
> query
> > in
> > > > style with:
> > > >
> > > > useragents:
> > > >
> > >
> > "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP-
> 2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/6.3.1.20.0"
> > > >
> > > > In this example I would have a hit because Mozilla/4.0* matches
> the
> > > > useragent.
> > > >
> > > > <doc>
> > > > <useragents>
> > > >                Firefox*
> > > >                Mozilla/4.0*
> > > > </useragents>
> > > > </doc>
> > > >
> > > >
> > > > Regards
> > > > Niclas
> > > >
> > >
> > >
> > >
> > > --
> > > Ted Dunning, CTO
> > > DeepDyve
> > >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
> 
> 
> 
> --
> Ted Dunning, CTO
> DeepDyve



Mime
View raw message