lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew S. Townley <...@atownley.org>
Subject Re: [lucy-user] Question about query parsing API
Date Sat, 26 Feb 2011 12:55:35 GMT
Hi Marvin,

On 25 Feb 2011, at 4:05 PM, Marvin Humphrey wrote:

> On Fri, Feb 25, 2011 at 12:41:02PM +0000, Andrew S. Townley wrote:
>> Another issue I just hit with Ferret (actually, the same root problem, but
>> manifested in a different way) made me wonder something else about the
>> design of lucy: the query parsing API.
> 
> I think there's a fundamental challenge with the proposed design
> (Lucy/Ferret/Lucene doesn't preserve metadata during scoring) regardless of
> which engine you choose, and I'll address it in a reply on the earlier thread.

Will address this in separate reply.

>> Is lucy's QueryParser API effectively this one?
>> 
>> http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Search/QueryParser.html
> 
> Yes.
> 
>> If so, is there a way to traverse the query parse tree
> 
> It's not public API yet, but it can be done, and perhaps we should consider
> making the API public.
> 
> ANDQuery, ORQuery, RequiredOptionalQuery and NOTQuery are all subclasses of
> PolyQuery.  PolyQuery provides a PolyQuery_Get_Children() method (which would
> be spelled get_children() in the Perl or Ruby-vaporware bindings).  Using
> that, you can traverse the hierarchy.  Indeed, that's what QueryParser does
> internally.

At the very least, I need to be able to walk the query tree for any string input query (or
any query object) with a consistent API.  What you have here is pretty similar to my own implementation
of the Query Object pattern for my system, so that would be a start.  For any compound query
term, I'm also "bubbling" up references to the property names as well as the query terms themselves.
 This means that I can retrieve these easily and do some analysis on the query before actually
executing it.

Anything that will support me doing the same type of thing with Lucy will work.

> 
>> Also, I mentioned SWIG in passing the other day in a previous message.
>> Would it not be possible to just generate the bindings for Ruby with SWIG?
> 
> SWIG won't do:
> 
>    http://lucy.markmail.org/thread/5uxmc655dvzzdpvx

Yeah, I got that and the rationale from Jens' reference.

> There are portions of Lucy that have been intentionally left unimplemented by
> the core.  The Perl implementation code is located in trunk/perl/xs/ and
> trunk/perl/lib/Lucy.pm.  This code will have to be ported for each new host
> language regardless.

Interesting approach.  Is there some docs/rationale on which parts and why somewhere?  Sounds
worth understanding in more detail.

> Once that's done, it *might* be theoretically possible to generate SWIG
> bindings as a short-term experiment, but there would be a lot of problems.
> Lucy's autogenerated header files won't map well.  There will be some quirks
> that would need to be worked out regarding Lucy's object model.  Lots of
> features will be missing -- subclassing, automated refcount management,
> default parameter values, etc.  It will also be quite unwieldy, because
> hashes, arrays, and strings won't get automatically converted at the binding
> barrier -- you'll have to do crazy stuff like creating Lucy::Object::CharBuf
> objects every time you want to pass a string into the Lucy core.

Yeah, ugh.

> What is planned instead is to adapt the materials under
> trunk/clownfish/lib/Clownfish/Binding/Perl to generate Ruby C API code instead
> of Perl C API code.  There's actually not a lot there:
> 
>    $ wc -l lib/Clownfish/Binding/Perl.pm lib/Clownfish/Binding/Perl/*
>         528 lib/Clownfish/Binding/Perl.pm
>         475 lib/Clownfish/Binding/Perl/Class.pm
>         150 lib/Clownfish/Binding/Perl/Constructor.pm
>         277 lib/Clownfish/Binding/Perl/Method.pm
>         269 lib/Clownfish/Binding/Perl/Subroutine.pm
>         298 lib/Clownfish/Binding/Perl/TypeMap.pm
>        1997 total
>    $ 
> 
> Most of the work the Clownfish compiler does involves parsing the Lucy header
> files and building a model of the Lucy object hierarchy in memory.  That work
> is done.  What's left is to port the code that walks that object hierarchy and
> generates binding code.  We have such code for Perl; we need to adapt it for
> Ruby.
> 
> Once that work is done, it's done.  Changes within Lucy's core don't require
> changes to Clownfish.
> 
> I'm actively working on the Clownfish code now -- adding host languages is a
> force-multiplier for the project, so it's a high priority.  See the roadmap at
> <http://markmail.org/thread/nfqfphjigqcl2svc>.
> 
> I've been vacillating between Python and Ruby as far as which bindings to work
> on next, but I tend to go where there are active collaborators.  If you're
> interested in contributing, you'll have company. :)
> 
> At the least, I intend to finish porting the bulk of the Clownfish compiler
> from Perl to C so that it's easier for non-Perl people to grok.

That's sounds good, and the roadmap makes sense.  Have since subscribed to lucy-devel as well
since a lot of the traffic there seems to address the kinds of things I'm interested in.

Thanks for the explanation.

ast
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org


Mime
View raw message