lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Searching for partial matches
Date Mon, 04 May 2009 16:35:19 GMT
Why are you using MultiPhraseQuery? It appears (warning,
I haven't really used it) to be designed to handle *phrases*.
You're problem statement isn't looking at phrases at all,
just a wildcard single terms. And you're supposed to
call the first MPQ.add with, say, the first word of the
*phrase*, not a term vector. So what happens when your
first add is an array of Terms I have no clue......

So I'd instead use RegexTermEnum (or possibly
WildcardTermEnum, don't know how this latter
works with leading wildcards, you'll have to check)
to enumerate the first 200 matching terms, and
add them clause by clause to a BooleanQuery,
and then use the BooleanQuery to search......

But I'd also have to ask how users will feel about
getting some small partial match of the "real" data
set. By taking the first 200 Terms you find, it looks
to the user either arbitrary on incomplete. Think about
using, say, WildcardTermEnum to construct a Filter
that you then pass to your search. Constructing
Filters is quite fast, although you lose the wildcarded
terms' contributions to the score (don't worry about this
last IMO).


Best
Erick

On Mon, May 4, 2009 at 11:21 AM, Huntsman84 <tpgarcia84@gmail.com> wrote:

>
> Hi
>
> I've tryed this with MultiPhraseQuery, but it always returns me all
> documents of the index, no matter what expression I use.
>
> I've read that adding a set of terms wich their values are all the entered
> query (e.g. "str"), the search works as the symbol "*" (e.g. "str*"), so I
> tryed that.
>
> My code is like this:
>
> // prompt the user
> System.out.println("Enter query: ");
>
> String line = in.readLine(); //if I put "str", I want all matches like
> "string", "strong", "astray"....
>
> line = line.trim();
>
> //I thought this would be ok...
> MultiPhraseQuery mpquery = new MultiPhraseQuery();
> TermEnum te = reader.terms(new Term("contents",line));
>
> /*Home-made conversion from TermEnum to Term[]... I get just 200 matches, I
> don't want my
>   machine busy...*/
> Term[] terms = new Term[200];
> int j=0;
>
> while(te.next() && j<200){
>    terms[j] = te.term();
>    j++;
> }
>
> mpquery.add(terms);
>
> Query query = parser.parse(line);
> System.out.println("Searching for: " + query.toString(field));
>
> Date start = new Date();
>
> Hits hits = searcher.search(mpquery);
> for(int k = 0; k<100; k++){
>    //when I print the results of the search, none of the values match with
> the query...
>    System.out.println(hits.doc(k).getField("contents").stringValue());
> }
>
>
>
> Ian Lea wrote:
> >
> > Hi
> >
> >
> > This is possible.  There is an entry on wildcards in the FAQ.  See
> > also RegexQuery and search the mailing lists for ngrams.
> >
> > Depending on your setup and requirements you may need to be aware of
> > the performance implications of wild card searching, particularly
> > leading wildcards as will be required for the example you give.  See
> > the FAQ and javadocs for WildcardQuery.
> >
> >
> > --
> > Ian.
> >
> > On Thu, Apr 30, 2009 at 11:46 AM, Huntsman84 <tpgarcia84@gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >>
> >>
> >> I am new to Lucene, and I don't know if it is possible to obtain results
> >> providing part of the keyword.
> >>
> >>
> >>
> >> For example, if I try to search "in", it should return all matches with
> >> "string", "meaning", "trinity"...
> >>
> >>
> >>
> >> Am I expecting too much?
> >>
> >>
> >>
> >> Thank you so much!
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Searching-for-partial-matches-tp23313810p23370618.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message