lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: "Advanced" query language
Date Mon, 05 Dec 2005 22:36:38 GMT
On Dec 5, 2005, at 3:18 PM, Paul Elschot wrote:
> On Monday 05 December 2005 17:56, John Haxby wrote:
>> Yonik Seeley wrote:
>>> I looked into this a year ago... most scripting languages have an
>>> emphasis on script execution speed, not script parsing speed  
>>> (which is
>>> what we would need).  The scripting languages I tried were horribly
>>> slow at parsing a small script.  The only one that could parse at a
>>> reasonable speed was rhino (javascript) in interp mode.
>> I've always found the lisp syntax very easy to parse.  In this case,
>> it's just prefix with the nam of he operator being first in the  
>> list, eg
>> (and "eggs" "oranges").   There are wrinkles for named and optional
>> parameters, but the basic syntax is a doddle.
> Lisp syntax is good at nesting, and it also does properties and roles:
> ((phrase 5) "eggs" "oranges")
> (boolean (must "eggs") (mustnot "oranges"))
> I like the simplicity. Now the earlier example:
> (boosting
>   (match
>     ((moreLikeThis (percent "0.25") (docId "44"))
>       (compareField "contents")
>       (compareField "title")
>     )
>   )
>   ((downgrade (demote "0.5"))
>     ((simple "contents")
>       (or "ice hockey" puck rink)
>     )
>   )
> )
> The deep nesting is tricky with only one kind of
> partentheses/brackets.

For data communication, LISP is just XML with parenthesis instead of  
angle brackets.

Again, we're talking machine-to-machine communication here, not human- 

> Perhaps python like is better. Python
> has nesting by indentation introduced by a colon at the end
> of the previous line.
> To be read with a fixed width font:
> boosting:
>   match:
>     moreLikeThis(percent="0.25", docId="44"):
>       compareField("contents")
>       compareField("title")
>   downgrade(demote="0.5"):
>     simple("contents"):
>       or:
>         "ice hockey"
>         puck
>         rink
> Quite readable, but not so easy to parse.
> (One could even do away with the colons.)

And this is basically YAML:

While there have been several different topics brought up on this  
thread, it seems we're diverging from the original idea.  Let's  
consider the most basic use case example here, and I'm making it  
intentionally as concrete as possible:

A Swing client performs searches by communicating with a Lucene  
search server, which is wrapped by a RESTful servlet.  The client  
wants to issue sophisticated queries that are not supported by  

Breakdown: Lucene itself is not needed on the client, only an HTTP  
connection with bits of XML back and forth to query and send  
results.   Sure, Lucene could be on the client and a Query created,  
serialized, and sent to the server.  The same version of Lucene,  
probably, is required on both sides, which seems over constrained to  
transfer a general query.  Having an XML format representing a Query  
and a mechanism to parse it into an actual Query instance makes a lot  
of sense.  The proposed LISP, Python, etc formats don't add anything  
to the equation, and actually complicate it.  The JVM has a built-in  
XML parser, so no additional dependency is needed (don't get me  
wrong, that argument has caused ugly languages to be created, like  
Ant :).  Again, we're talking only machine-to-machine communication  
here, but with Ant humans are involved - so the argument is a bit  
different.   I don't see a need to construct a new language, but  
rather to do just what Mark has mentioned, come up with a Spring/Ant,  
and I'll toss in other ideas to consider, HiveMind/Digester-like  
mechanism to instantiate a Query from parsed XML.  Digester might be  
sufficient enough, actually.  Spring seems pretty heavy for this  
job.  Just a tiny bit of SAX/DOM custom coding, actually, is all that  
we need really.  Because of the variations in how the Query subclass  
constructors and setters work, there will need to be a little bit of  
glue between as well, I presume.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message