lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject Re: XML search language...
Date Tue, 10 Dec 2002 15:50:07 GMT


Pier Fumagalli wrote:

>><?xml version="1.0" encoding="UTF-8"?>
>><query>
>> <boolean type="and">
>>   <term field="character">Bird</term>
>>   <group>
>>     <term field="category">Cartoon</term>
>>     <boolean type="not">
>>       <term field="name">Roadrunner</term>
>>     </boolean>
>>   </group>
>> </boolean>
>></query>
> 
> 
> That _is_ a beauty.. Plain, simple, and doing exactly what I need! :-)
> 
> No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that does the
> job I require... :-)
> 
> Only thing I don't "like" is how you group up the terms, for example, I
> don't quite get the distinction between "boolean / and" and group...
> 
> In theory, a binary operation can always be reducible to its minimal
> configuration of two terms, depending on what precedence we give to the (for
> instance) "and" "or" and "not" operations... So, I don't see why group is
> actually there! :-)
> 

I was trying to allow for precedence that may be determined somehow by 
"()"'s. I think your right and I guess it could be simpler to base it on 
nesting and just use ((stack or queues) and recursion) to deal with 
precedence:

With preceedence ordered (AND, OR).

The following *do* appear the same to me.

field1:foo AND field2:bar OR field3:bim AND field4:bam
(field1:foo AND field2:bar) OR (field3:bim AND field4:bam)

*last one in XML*
<?xml version="1.0" encoding="ISO-8859-1"?>
<query index="Articles">	
    <or>
       <and>
          <term field="field1">foo</term>
          <term field="field2">bar</term>
       </and>
       <and>
          <term field="field3">bim</term>
          <term field="field4">bam</term>
       </and>
    </or>
</query>


The following *don't* appear the same to me.

field1:foo AND field2:bar OR field3:bim AND field4:bam
field1:foo AND (field2:bar OR field3:bim) AND field4:bam

*but the last one can still be captured in XML without a group tag*
<?xml version="1.0" encoding="ISO-8859-1"?>
<query index="Articles">
    <and>
       <term field="field1">foo</term>
       <and>
          <or>
             <term field="field2">bar</term>
             <term field="field3">bim</term>
          </or>
          <term field="field4">bam</term>
       </and>
    </and>
</query>





> And also, one other thing is that since we have the flexibility of XML, why
> not using specific tags, such as  <and/> or <or/> and <not/>...
> 

I was trying to make it more extensible and generic. But that may be 
overkill as well. The idea is that services could define their own 
operations without building "new xml tags" because the op was just an 
attribute. thus:

  <and> could be <boolean type="AND|and|+|&&|&">
  <or> could be <boolean type="OR|or|'||'|'|'">
  <not> could be <boolean type="NOT|not|-|!">

  then we know its a boolean relation, we just don't care what 
characters are representing it in the long run. The only risk this 
brings up (and thus the need for <group> tags) is precedence of the 
unknown character representations.

Also note that <term field="xxx">test</term> carries another attribute 
that defines the operation being performed on that term. For example:

<term field="date" op="gt">1996</term>

> That is because, if you process SAX events, you can easily trigger on those
> names which are unique in your tag, while if you do use attributes, well,
> the whole thing gets a little bit messed up in terms of parsing/checking and
> slower because you have to analyze every single attribute to get the "type"
> of your boolean operation....
> 
> I'm thinking about something like:
> 
> <?xml version="1.0"?>
> <query index="Articles">
>   <and>
>     <term field="subject">Microsoft</term>
>     <or>
>        <term>Lawsuit</term>
>        <term>Court</term>
>     </or>
>   </and>
> </query>
> 
> Does it make sense????
> 
>     Pier
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message