From java-user-return-17328-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Nov 04 23:31:39 2005 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 51774 invoked from network); 4 Nov 2005 23:31:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 4 Nov 2005 23:31:39 -0000 Received: (qmail 11509 invoked by uid 500); 4 Nov 2005 23:31:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 11448 invoked by uid 500); 4 Nov 2005 23:31:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 11405 invoked by uid 99); 4 Nov 2005 23:31:31 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2005 15:31:31 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of sean@oconeco.com designates 216.239.128.26 as permitted sender) Received: from [216.239.128.26] (HELO smtp.omnis.com) (216.239.128.26) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2005 15:31:25 -0800 Received: from [127.0.0.1] (va-71-48-139-67.dhcp.sprint-hsd.net [71.48.139.67]) by smtp-relay.omnis.com (Postfix) with ESMTP id 919BA200689D for ; Fri, 4 Nov 2005 15:31:08 -0800 (PST) Message-ID: <436BEF79.3000303@oconeco.com> Date: Fri, 04 Nov 2005 18:32:09 -0500 From: Sean O'Connor User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: SpanQuery parser? Update (ugly hack inside...) References: <4356CB59.2040409@oconeco.com> <200510200109.21473.paul.elschot@xs4all.nl> In-Reply-To: <200510200109.21473.paul.elschot@xs4all.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I'm posting this primarily hoping to give back a tiny bit to a very helpful community. More likely however, someone else will open my eyes to an easier approach than what I outline below... I've come up with a very ugly conversion approach from regular Query objects into SpanQuery objects. I then use the converted SpanQuery to get span positions (currently both token #, and start/end position). In effect, I have highlighting for simple queries with a very inefficient approach (yea for me!). The goal(s) I am trying to accomplish is rather specific I think, so I imagine the use of my hacking is rather limited (i.e. just to me). At the moment my code: * parses the search text (i.e. user entered query) * rewrites the resulting query to expand wildcards and such against index * calls a recursive conversion function with very basic conversion understanding o TermQuery -> SpanTerm o PhraseQuery -> SpanNear o others in progress as time permits Currently, I only process simple query strings like: "blue green yellow" => SpanOrQuery "luce* acti*" => SpanOrQuery with wild cards expanded e.g.: lucene lucent action acting ... all or'ed together in a braindead fashion "luce* acti* \"book rocks\"" => SpanOrQuery combining SpanTerms and SpanNear (no slop) er, hopefully you get the picture, I'm not up to showing a vector of this one... :-) I would be happy to discuss my approach if there is anyone interested. I assume I am pretty much alone in finding this ineffecient approach useful. For me, it is the functionality that overrides perfomance issues. I have something which can take user search strings and do hit highlighting for the exact hit found. This is really only useful for "termA near 'some phrase'" at the moment, but might become more advanced in the next 2-3 months. Sean Paul Elschot wrote: >On Thursday 20 October 2005 00:40, Sean O'Connor wrote: > > >>Hello, >> I have user entered search commands which I want to convert to >>SpanQueries. I have seen in the book "Lucene in Action" that no parser >>existed at time of publication, but there was someone working on a >>SpanQuery parser. Can anyone point me to that code, or provide any >>suggestions? >> >> I want to use SpanQueries for their detail on the number of hits >>from a query, and more importantly, the location (position start and >>end) of each hit. My application requires me to do precise hit >>highlighting. I also need to perform calculations on the number of hits >>per document, as well as per query (sum of document hits). >> >> > >You may want to use the getSpans() method of SpanQuery and operate >on the result directly. > > > >> It is fairly critical I highlight the hits, and only the hits. From >>what I've read SpanQueries (with dumpSpans) is a better approach than >>using 'regular' queries. I _think_ regular queries currently use a >>highlighter which shows all terms highlighted. This can give more >>highlighting than actual hits (i.e false positives). >> >> So, that being said, should I stick with SpanQueries? Is there any >>current work on a parser to convert a string, or regular (Token, >>Boolean, Phrase, Prefix,...) query into a SpanQuery? >> >> I have written some very duct tape-ish code which will convert basic >>booleanOR and prefix queries into SpanQueries. I just realized I'm in >>deeper water than I expected when I tried converting my first query >>string containing several boolean queries, AND a phrase query. So now I >>am looking to either help an existing effort, or just continue with my >>own hacking. >> >> > >:) > >Have a look at the surround query parser in the svn trunk: >http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/ > >There is also some code that does highlighting based on Spans, >but I don't know where that is. Hopefully someone else can point you at that. > >Regards, >Paul Elschot > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org