Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 45863 invoked from network); 11 Sep 2008 15:14:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Sep 2008 15:14:37 -0000 Received: (qmail 18321 invoked by uid 500); 11 Sep 2008 15:14:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 18294 invoked by uid 500); 11 Sep 2008 15:14:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 18283 invoked by uid 99); 11 Sep 2008 15:14:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2008 08:14:27 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.112.254.51] (HELO smtp2.bedag.ch) (192.112.254.51) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2008 15:13:29 +0000 Received: from localhost (localhost [127.0.0.1]) by smtp1.bedag.ch (Postfix) with ESMTP id E27C2739E8 for ; Thu, 11 Sep 2008 17:13:27 +0200 (CEST) X-Quarantine-ID: X-Virus-Scanned: Scanned Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: AW: AW: AW: Search with multiple wildcards Date: Thu, 11 Sep 2008 17:13:22 +0200 Message-ID: <1D3D7943573BD14FB024A9654EB43CCB02A4C69D@A99A-EMS-01VS2.ad.bedag.ch> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: AW: AW: Search with multiple wildcards Thread-Index: AckUF34byuwaMa+qTMebC/7XLTbTBwACSoOQ References: <506555.67316.qm@web26005.mail.ukl.yahoo.com> <48C92581.3070208@informatics.jax.org> From: "Sertic Mirko, Bedag" To: X-OriginalArrivalTime: 11 Sep 2008 15:13:23.0518 (UTC) FILETIME=[EED475E0:01C91420] X-Virus-Checked: Checked by ClamAV on apache.org Ok, i see the problems. I will talk to my customer about this requirement. Perhaps he doesn't = need it anymore. Again, thanks a lot to all, you saved my day! Regards Mirko -----Urspr=FCngliche Nachricht----- Von: Matthew Hall [mailto:mhall@informatics.jax.org]=20 Gesendet: Donnerstag, 11. September 2008 16:05 An: java-user@lucene.apache.org Betreff: Re: AW: AW: Search with multiple wildcards Ah.. that's a darn good point.. Though, that second bit of code you have there could be used at display=20 time for him to get the functionality that he wants. You could also=20 modify it somewhat, and apply it against the displayable part of the hit = he's getting back rather than the individual tokens. This if of course only assuming that this functionality would be used=20 after a contains type search was detected. Considering that's the only real usecase for a technique like this, I'm=20 thinking its probably more trouble than its worth for his case. mark harwood wrote: >>> That should give you the functionality you are looking for. >>> =20 > > If I understand your suggestion correctly, It won't. The Highlighter = uses a tokenized version of the document text. > > Simplistically it does the following psuedo code: > > for all tokens in documentTokenStream, > if(queryTermsSet.contains(token)) > output ""+token+" > else > output token > > NOT > > for all tokens in query string > fullDocumentString.replaceAll(queryStringToken, = ""+queryStringToken+"" > > So in the given example while you suggest manipulating "ll" to be in = the query string, you cannot make "ll" appear as a token in = documentTokenStream. > > Actually the Highlighter logic is a fair bit more involved than this = (especially when using SpanQueryScorer) but the basis of it is there in = the above pseudo code. > > > > > > ----- Original Message ---- > From: Matthew Hall > To: java-user@lucene.apache.org > Sent: Thursday, 11 September, 2008 14:40:26 > Subject: Re: AW: AW: Search with multiple wildcards > > Well, you could certainly manipulate your search string, removing the=20 > wildcard punctuations, and then use that for what you pass to the=20 > highlighter. > > That should give you the functionality you are looking for. > > > -Matt > mark harwood wrote: > =20 >>>> Is this possible? >>>> =20 >>>> =20 >> Not currently, the highlighter works with a list of words (or words = AND phrases using the new span support) and highlights those. >> To do anything else would require the higlighter to faithfully = re-implement much of the logic in all of the different query types = (fuzzy, wildcard, regex etc etc) which is much more = challenging/difficult to maintain. >> >> >> >> ----- Original Message ---- >> From: "Sertic Mirko, Bedag" >> To: java-user@lucene.apache.org >> Sent: Thursday, 11 September, 2008 12:07:36 >> Subject: AW: AW: Search with multiple wildcards >> >> Ok, one final question: >> >> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or = ...), so the >> Highligter will highlight the words "hallo" or "alle". But how can i = highlight only >> the original query, so only the "ll"? Is this possible? >> >> Thanks a lot >> Mirko >> >> -----Urspr=FCngliche Nachricht----- >> Von: mark harwood [mailto:markharw00d@yahoo.co.uk]=20 >> Gesendet: Donnerstag, 11. September 2008 11:20 >> An: java-user@lucene.apache.org >> Betreff: Re: AW: Search with multiple wildcards >> >> You need to call rewrite on the query to expand it then give that = version to the highlighter - see the package javadocs. >> = http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highligh= t/package-summary.html#package_description >> >> >> Cheers >> Mark >> >> >> >> >> ----- Original Message ---- >> From: "Sertic Mirko, Bedag" >> To: java-user@lucene.apache.org >> Sent: Thursday, 11 September, 2008 9:34:13 >> Subject: AW: Search with multiple wildcards >> >> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I = see that >> 3ildcard queries are expanded before they are processed, and I see = that i can >> set the clauses count to Integer.MAXVALUE, and queries can consume a = lot of memory,=20 >> but one final thing is still open: does a wildcard query work = together with=20 >> the Lucene Highlighter? I tried it, but I only got an empty result. = Without=20 >> wildcards, the highlighter works pretty smooth! >> >> Regards >> Mirko >> >> -----Urspr=FCngliche Nachricht----- >> Von: Erick Erickson [mailto:erickerickson@gmail.com]=20 >> Gesendet: Mittwoch, 10. September 2008 18:15 >> An: java-user@lucene.apache.org >> Betreff: Re: Search with multiple wildcards >> >> Of course you can construct your own BooleanQuery >> programmatically. >> >> It's relatively easy, just try it. >> >> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag = > =20 >> =20 >>> wrote: >>> =20 >>> =20 >> =20 >> =20 >>> Jep, this is what i have read. >>> >>> do I need to use the query parser, or can I create a query by the = api? >>> Is there an example available? >>> >>> Thanks a lot >>> Mirko >>> >>> -----Urspr=FCngliche Nachricht----- >>> Von: Erick Erickson [mailto:erickerickson@gmail.com] >>> Gesendet: Mittwoch, 10. September 2008 16:45 >>> An: java-user@lucene.apache.org >>> Betreff: Re: Search with multiple wildcards >>> >>> Is this what you're referring to? >>> >>> Lucene supports single and multiple character wildcard searches = within >>> single terms (not within phrase queries). >>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html) >>> >>> I'm pretty sure you can have multiple *terms* with wildcards. Luke = is your >>> friend here, download a copy and try it . Be sure on the search = tab to >>> specify StandardAnalyzer or some such, rather than keywordanalyzer. >>> >>> The phrase is trying to point out that a phrase query does NOT = respect >>> wildcards. That is, submitting >>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm = pretty >>> sure that >>> >>> +field:ab* +field:bc* +field:cd* >>> >>> will work just fine. The key here is "within single terms", which I = think >>> of >>> as >>> "within a single term query". You can add as many TermQuerys as you = want. >>> See the query documentation for how to submit phrase queries. >>> >>> Best >>> Erick >>> >>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag >>> >> =20 >>> =20 >>>> wrote: >>>> =20 >>>> Hi >>>> >>>> Thank you for your quick response:-) >>>> >>>> Of course I need to use the * character :-) But I have read = somewhere in >>>> the documentation that leading wildcards are not supported, and = only one >>>> wildcard term per query. Is this limitation resolved in the current >>>> =20 >>>> =20 >>> version? >>> =20 >>> =20 >>>> Regards >>>> Mirko >>>> >>>> -----Urspr=FCngliche Nachricht----- >>>> Von: Erick Erickson [mailto:erickerickson@gmail.com] >>>> Gesendet: Mittwoch, 10. September 2008 15:47 >>>> An: java-user@lucene.apache.org >>>> Betreff: Re: Search with multiple wildcards >>>> >>>> Sure, but you'll have to set the leading wildcard option, >>>> which I've forgotten the exact call, but it's in the docs. >>>> >>>> And use * rather than % . >>>> >>>> But wildcards are tricky, especially the TooManyClauses >>>> exception. You might want to peruse the archive for wildcard >>>> posts... >>>> >>>> Best >>>> Erick >>>> >>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag >>>> wrote: >>>> >>>> =20 >>>> =20 >>>>> Hi@all >>>>> >>>>> >>>>> >>>>> Is it possible to do a search with multiple wildcards in one = query, for >>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example = available? >>>>> >>>>> >>>>> >>>>> Thanks a lot >>>>> >>>>> Mirko >>>>> >>>>> >>>>> >>>>> >>>>> =20 >>>>> =20 >>>> = --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> =20 >>>> =20 >>> = --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> =20 >>> =20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> =20 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> =20 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> =20 >> =20 > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > =20 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > =20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org