lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huntsman84 <tpgarci...@gmail.com>
Subject Re: Searching for partial matches
Date Mon, 04 May 2009 17:19:23 GMT

Ok, I will try that, just one more question.

Do you know why there is a class called "RegexQuery" that appears in the API
documentation but doesn't exist in the lucene-core-2.4.1.jar? I think that
class would be very useful for my problem...

Thank you so much!!


Erick Erickson wrote:
> 
> "the guys" really helped me understand the issues with wildcards,
> it's harder than you think <G>. Try looking over the searchable
> archive for a thread titled "I just don't get wildcards at all" from a
> couple of hears ago. Note: Lucene has advanced significantly
> since then, but the underlying combinatorial complexity of
> wildcards is still there.
> 
> Some people have found joy with n-gram representations for
> this kind of problem, a lot depends upon how big your corpus is
> and what exactly your requirements are. Search the archive for
> ngram (or n-gram) for lots of discussion on that topic.
> 
> Anyway, something to think about.
> 
> Best
> Erick
> 
> On Mon, May 4, 2009 at 12:48 PM, Huntsman84 <tpgarcia84@gmail.com> wrote:
> 
>>
>> My aim is to handle * phrases *, as you say, but I don't know how to
>> build
>> a
>> WildCardQuery for that purpose... I read in the documentation that those
>> kind of queries can't start with '*' (e.g. * phrase *), so I tryed
>> MultiPhraseQuery instead.
>>
>> Forgive me if I am too newbie, 10 days ago I didn't know this tool
>> existed...
>>
>>
>> Erick Erickson wrote:
>> >
>> > Why are you using MultiPhraseQuery? It appears (warning,
>> > I haven't really used it) to be designed to handle *phrases*.
>> > You're problem statement isn't looking at phrases at all,
>> > just a wildcard single terms. And you're supposed to
>> > call the first MPQ.add with, say, the first word of the
>> > *phrase*, not a term vector. So what happens when your
>> > first add is an array of Terms I have no clue......
>> >
>> > So I'd instead use RegexTermEnum (or possibly
>> > WildcardTermEnum, don't know how this latter
>> > works with leading wildcards, you'll have to check)
>> > to enumerate the first 200 matching terms, and
>> > add them clause by clause to a BooleanQuery,
>> > and then use the BooleanQuery to search......
>> >
>> > But I'd also have to ask how users will feel about
>> > getting some small partial match of the "real" data
>> > set. By taking the first 200 Terms you find, it looks
>> > to the user either arbitrary on incomplete. Think about
>> > using, say, WildcardTermEnum to construct a Filter
>> > that you then pass to your search. Constructing
>> > Filters is quite fast, although you lose the wildcarded
>> > terms' contributions to the score (don't worry about this
>> > last IMO).
>> >
>> >
>> > Best
>> > Erick
>> >
>> > On Mon, May 4, 2009 at 11:21 AM, Huntsman84 <tpgarcia84@gmail.com>
>> wrote:
>> >
>> >>
>> >> Hi
>> >>
>> >> I've tryed this with MultiPhraseQuery, but it always returns me all
>> >> documents of the index, no matter what expression I use.
>> >>
>> >> I've read that adding a set of terms wich their values are all the
>> >> entered
>> >> query (e.g. "str"), the search works as the symbol "*" (e.g. "str*"),
>> so
>> >> I
>> >> tryed that.
>> >>
>> >> My code is like this:
>> >>
>> >> // prompt the user
>> >> System.out.println("Enter query: ");
>> >>
>> >> String line = in.readLine(); //if I put "str", I want all matches like
>> >> "string", "strong", "astray"....
>> >>
>> >> line = line.trim();
>> >>
>> >> //I thought this would be ok...
>> >> MultiPhraseQuery mpquery = new MultiPhraseQuery();
>> >> TermEnum te = reader.terms(new Term("contents",line));
>> >>
>> >> /*Home-made conversion from TermEnum to Term[]... I get just 200
>> matches,
>> >> I
>> >> don't want my
>> >>   machine busy...*/
>> >> Term[] terms = new Term[200];
>> >> int j=0;
>> >>
>> >> while(te.next() && j<200){
>> >>    terms[j] = te.term();
>> >>    j++;
>> >> }
>> >>
>> >> mpquery.add(terms);
>> >>
>> >> Query query = parser.parse(line);
>> >> System.out.println("Searching for: " + query.toString(field));
>> >>
>> >> Date start = new Date();
>> >>
>> >> Hits hits = searcher.search(mpquery);
>> >> for(int k = 0; k<100; k++){
>> >>    //when I print the results of the search, none of the values match
>> >> with
>> >> the query...
>> >>    System.out.println(hits.doc(k).getField("contents").stringValue());
>> >> }
>> >>
>> >>
>> >>
>> >> Ian Lea wrote:
>> >> >
>> >> > Hi
>> >> >
>> >> >
>> >> > This is possible.  There is an entry on wildcards in the FAQ.  See
>> >> > also RegexQuery and search the mailing lists for ngrams.
>> >> >
>> >> > Depending on your setup and requirements you may need to be aware of
>> >> > the performance implications of wild card searching, particularly
>> >> > leading wildcards as will be required for the example you give.  See
>> >> > the FAQ and javadocs for WildcardQuery.
>> >> >
>> >> >
>> >> > --
>> >> > Ian.
>> >> >
>> >> > On Thu, Apr 30, 2009 at 11:46 AM, Huntsman84 <tpgarcia84@gmail.com>
>> >> wrote:
>> >> >>
>> >> >> Hello,
>> >> >>
>> >> >>
>> >> >>
>> >> >> I am new to Lucene, and I don't know if it is possible to obtain
>> >> results
>> >> >> providing part of the keyword.
>> >> >>
>> >> >>
>> >> >>
>> >> >> For example, if I try to search "in", it should return all matches
>> >> with
>> >> >> "string", "meaning", "trinity"...
>> >> >>
>> >> >>
>> >> >>
>> >> >> Am I expecting too much?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thank you so much!
>> >> >
>> >> >
>> ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Searching-for-partial-matches-tp23313810p23370618.html
>> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Searching-for-partial-matches-tp23313810p23372180.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-for-partial-matches-tp23313810p23372686.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message