lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: SpanQuery parser? Update (ugly hack inside...)
Date Sat, 05 Nov 2005 15:50:21 GMT
On Saturday 05 November 2005 01:29, Erik Hatcher wrote:
> 
> On 4 Nov 2005, at 18:32, Sean O'Connor wrote:
> > I'm posting this primarily hoping to give back a tiny bit to a very  
> > helpful community. More likely however, someone else will open my  
> > eyes to an easier approach than what I outline below...
> >
> > I've come up with a very ugly conversion approach from regular  
> > Query objects into SpanQuery objects. I then use the converted  
> > SpanQuery to get span positions (currently both token #, and start/ 
> > end position). In effect, I have highlighting for simple queries  
> > with a very inefficient approach (yea for me!).
> 
> As you and I have talked about on a couple of face to face occasions,  
> this is the approach I am taking on a current consulting project.  My  
> conversion code is slightly different than yours in that I don't  
> rewrite the query, but translate it as-is into comparable SpanQuery  
> subclasses - and this is because I have a RegexQuery and  
> SpanRegexQuery that are comparable.  But rewriting is a good  
> pragmatic way to go for general query types that don't have a  
> comparable SpanQuery subclass.
> 
> > The goal(s) I am trying to accomplish is rather specific I think,  
> > so I imagine the use of my hacking is rather limited (i.e. just to  
> > me).
> >
> > At the moment my code:
> >
> >    * parses the search text (i.e. user entered query)
> 
> Are you using QueryParser?  If so, you'll also want to account for  
> BooleanQuery, recursively.

The surround parser can create both boolean queries and span queries.

Sean, as you seem to prefer not to use the surround syntax, do you think
this syntax could be improved somehow? I recall trying to make it simpler,
but when I made it I was not able to do so.

Also, PhraseQuery is more efficient than a combination of SpanTermQuery,
SpanOrQuery and SpanNearQuery, so perhaps PhraseQuery should have
a getSpans() method so it could be used as a SpanQuery, too.

Regards,
Paul Elschot

> 
> >    * rewrites the resulting query to expand wildcards and such against
> >      index
> >    * calls a recursive conversion function with very basic conversion
> >      understanding
> >          o TermQuery -> SpanTerm
> >          o PhraseQuery -> SpanNear
> >          o others in progress as time permits
> >
> > Currently, I only process simple query strings like:
> > "blue green yellow" => SpanOrQuery
> > "luce* acti*" => SpanOrQuery with wild cards expanded
> >    e.g.: lucene lucent action acting ... all or'ed together in a  
> > braindead fashion
> > "luce* acti* \"book rocks\"" => SpanOrQuery combining SpanTerms and  
> > SpanNear (no slop)
> >    er, hopefully you get the picture, I'm not up to showing a  
> > vector of this one... :-)
> >
> > I would be happy to discuss my approach if there is anyone  
> > interested. I assume I am pretty much alone in finding this  
> > ineffecient approach useful. For me, it is the functionality that  
> > overrides perfomance issues.
> 
> What is inefficient about it?   The rewrite stuff is the main  
> difference, and perhaps that is the issue you're encountering.  Where  
> do you see the performance issues?
> 
> Converting a query, for me at least, is fast - perhaps because there  
> is no rewriting involved.
> 
> > I have something which can take user search strings and do hit  
> > highlighting for the exact hit found. This is really only useful  
> > for "termA near 'some phrase'" at the moment, but might become more  
> > advanced in the next 2-3 months.
> 
> I'm basically implementing this very thing.  I will likely be  
> enhancing the contrib/highlighter code in the next month to use  
> SpanQuery for highlighting, as well as adding field-aware highlighting.
> 
>      Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message