lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mihaela olteanu <mihaela...@yahoo.com>
Subject Re: Solr 4.x union of cross-joins
Date Mon, 08 Jul 2013 12:52:12 GMT
Here is exactly the data that I'm working with and the results for some tests that I performed:

parent       child1       child2       
id KEY_S COMMENT_T TYPE_S id KEY_S COMMENT_T TYPE_S id KEY_S COMMENT_T TYPE_S 
1 1 ventilation test Parent 4 1 comment4 Child1 7 1 comment6 Child2 
2 2 comment2 Parent 5 2 ventilation test
 Child1 8 2 comment7 Child2 
3 3 comment3 Parent 6 3 comment5 Child1 9 3 ventilation test
 Child2 
                10 3 ventilation test
 Child2 
I am using dynamic fields and here is their definition:

<dynamicField name="*_S" type="string" indexed="true" stored="true"/>      
<dynamicField name="*_T" type="textUnicode" indexed="true" stored="true" />        
         
<fieldType name="textUnicode" class="solr.TextField" positionIncrementGap="100" storeOffsetsWithPositions="true">
  
      <analyzer type="index">          
        <tokenizer class="solr.StandardTokenizerFactory"/>        
        <filter class="solr.ICUFoldingFilterFactory"/>        
  <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> 
  
  <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="300000"/>    
  <filter class="solr.SnowballPorterFilterFactory" language="English"/>     
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>       
   </analyzer>          
      <analyzer type="query">          
        <tokenizer class="solr.StandardTokenizerFactory"/>        
  <filter class="solr.ICUFoldingFilterFactory"/>       
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
     
        <filter class="solr.SnowballPorterFilterFactory" language="English"/>
     
      </analyzer>           
    </fieldType>           
Test scenarios:

1. only one join in the query 

q={!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:ventilation TYPE_S:Child1)&q.op=AND
JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}+COMMENT_T:ventil +TYPE_S:Child1)

q={!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("some text") AND TYPE_S:Child1)&q.op=OR
JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}+COMMENT_T:"ventil test" +TYPE_S:Child1)

- conclusion: I can write any type of query for the joined index

2. Union of two cross join queries

2.1) parent inner join child1 on KEY_S=KEY_S where COMMENT_T = "ventilation test" and TYPE_S="Child1"
OR parent inner join child2 on KEY_S=KEY_S where COMMENT_T = "ventilation" 

a) - returns the expected results 
q={!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("ventilation test") AND TYPE_S:Child1)
{!join from=KEY_S to=KEY_S fromIndex=child2}COMMENT_T:ventilation&q.op=OR

JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}(+COMMENT_T:"ventil test" +TYPE_S:Child1)
{!join from=KEY_S to=KEY_S fromIndex=child2}COMMENT_T:ventil)

Results
{id=2, KEY_S=2, TYPE_S=Parent, COMMENT_T=comment2, _version_=1439995636607877120, timestamp=Mon
Jul 08 08:34:14 EDT 2013}
{id=3, KEY_S=3, TYPE_S=Parent, COMMENT_T=comment3, _version_=1439995636627800064, timestamp=Mon
Jul 08 08:34:14 EDT 2013}

2.2) parent inner join child1 on KEY_S=KEY_S where COMMENT_T = "ventilation test" and TYPE_S="Child1"
OR parent inner join child1 on KEY_S=KEY_S where COMMENT_T = "ventilation" and TYPE_S="Child2"

a) - no results - but it should

q={!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("ventilation test") AND TYPE_S:Child1)
{!join from=KEY_S to=KEY_S fromIndex=child2}COMMENT_T:ventilation AND TYPE_S:Child2&q.op=OR
JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}(+COMMENT_T:"ventil test" +TYPE_S:Child1)
+{!join from=KEY_S to=KEY_S fromIndex=child2}COMMENT_T:ventil +TYPE_S:Child2)

b)  - no results- but it should
q={!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("ventilation test") AND TYPE_S:Child1)
OR {!join from=KEY_S to=KEY_S fromIndex=child2}COMMENT_T:ventilation AND TYPE_S:Child2&q.op=OR
JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}(+COMMENT_T:"ventil test" +TYPE_S:Child1)
+{!join from=KEY_S to=KEY_S fromIndex=child2}COMMENT_T:ventil +TYPE_S:Child2)

c)  - parse error for the bellow example (could not add brackets for grouping the query for
the child2 joined index)
{!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("ventilation test") AND TYPE_S:Child1)
OR {!join from=KEY_S to=KEY_S fromIndex=child2}(COMMENT_T:ventilation AND TYPE_S:Child2)

d)  - returns the expected results
{!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("ventilation test") AND TYPE_S:Child1)
OR _query_:"{!join from=KEY_S to=KEY_S fromIndex=child2}(COMMENT_T:ventilation AND TYPE_S:Child2)"
JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}(+COMMENT_T:"ventil test" +TYPE_S:Child1)
{!join from=KEY_S to=KEY_S fromIndex=child2}+COMMENT_T:ventil +TYPE_S:Child2)

Results
{id=2, KEY_S=2, TYPE_S=Parent, COMMENT_T=comment2, _version_=1439995636607877120, timestamp=Mon
Jul 08 08:34:14 EDT 2013}
{id=3, KEY_S=3, TYPE_S=Parent, COMMENT_T=comment3, _version_=1439995636627800064, timestamp=Mon
Jul 08 08:34:14 EDT 2013}

2.3) select from parent where COMMENT_T="ventilation test" and TYPE_S="Parent" OR parent inner
join child1 on KEY_S=KEY_S where COMMENT_T = "ventilation test" and TYPE_S="Child1" OR parent
inner join child1 on KEY_S=KEY_S where COMMENT_T = "ventilation" and TYPE_S="Child2"

a) - no results from parent althought it should
{!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("ventilation test") AND TYPE_S:Child1)
OR _query_:"{!join from=KEY_S to=KEY_S fromIndex=child2}(COMMENT_T:ventilation AND TYPE_S:Child2)"
OR _query_:"COMMENT_T:(ventilation test) AND TYPE_S:Parent"
JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}(+COMMENT_T:"ventil test" +TYPE_S:Child1)
{!join from=KEY_S to=KEY_S fromIndex=child2}+COMMENT_T:ventil +TYPE_S:Child2 (+(COMMENT_T:ventil
COMMENT_T:test) +TYPE_S:Parent))

b) - parser errors
 _query_:"COMMENT_T:(ventilation test) AND TYPE_S:Parent" OR {!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:("some
text") AND TYPE_S:Child1) OR _query_:"{!join from=KEY_S to=KEY_S fromIndex=child2}(COMMENT_T:ventilation
AND TYPE_S:Child2)"
 
 c) - all expected results (from parent as well)
  _query_:"COMMENT_T:(ventilation test) AND TYPE_S:Parent" OR _query_:"{!join from=KEY_S
to=KEY_S fromIndex=child1}(COMMENT_T:(ventilation test) AND TYPE_S:Child1)" OR _query_:"{!join
from=KEY_S to=KEY_S fromIndex=child2}(COMMENT_T:ventilation AND TYPE_S:Child2)"
  (+(COMMENT_T:ventil COMMENT_T:test) +TYPE_S:Parent) JoinQuery({!join from=KEY_S to=KEY_S
fromIndex=child1}+(COMMENT_T:ventil COMMENT_T:test) +TYPE_S:Child1) JoinQuery({!join from=KEY_S
to=KEY_S fromIndex=child2}+COMMENT_T:ventil +TYPE_S:Child2)
  
  _query_:"{!join from=KEY_S to=KEY_S fromIndex=child1}(COMMENT_T:(ventilation test) AND
TYPE_S:Child1)" OR _query_:"{!join from=KEY_S to=KEY_S fromIndex=child2}(COMMENT_T:ventilation
AND TYPE_S:Child2)" OR  _query_:"COMMENT_T:(ventilation test) AND TYPE_S:Parent"
  JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child1}+(COMMENT_T:ventil COMMENT_T:test)
+TYPE_S:Child1) JoinQuery({!join from=KEY_S to=KEY_S fromIndex=child2}+COMMENT_T:ventil +TYPE_S:Child2)
(+(COMMENT_T:ventil COMMENT_T:test) +TYPE_S:Parent)

Results:
{id=1, KEY_S=1, TYPE_S=Parent, COMMENT_T=ventilation test, _version_=1439995541507276800,
timestamp=Mon Jul 08 08:32:43 EDT 2013}
{id=2, KEY_S=2, TYPE_S=Parent, COMMENT_T=comment2, _version_=1439995541529296896, timestamp=Mon
Jul 08 08:32:43 EDT 2013}
{id=3, KEY_S=3, TYPE_S=Parent, COMMENT_T=comment3, _version_=1439995541534539776, timestamp=Mon
Jul 08 08:32:43 EDT 2013}



As a conclusion, I can write any query I want with union of multiple join and union with documents
taken from the parent index only if I use _query_ for each individual piece from the query
(one _query_ when querying the parent and one _query_ for each different join query). Unfortunately
when using _query_ I cannot write "exact match" queries because I cannot add the query string
in " (quotes) (I already add quotes for the query: _query_:"<actual query>").

I would say that the scenarios 2.2 (a-c) and 2.3 (a-b) should work as well. If those are
not bugs of the query parser what syntax shall I use?

Thanks in advance!
Mihaela

________________________________
 From: Yonik Seeley <yonik@lucidworks.com>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>; mihaela olteanu <mihaela_ol@yahoo.com>

Sent: Saturday, July 6, 2013 10:31 PM
Subject: Re: Solr 4.x union of cross-joins
 

On Sat, Jul 6, 2013 at 2:22 PM, mihaela olteanu <mihaela_ol@yahoo.com> wrote:
> Hello,
>
> I have 3 indices that form a hierarchy. Basically these were constructed from 3 tables:
parent, child1 and child2 and between parent and children there is a one to many relationship.
> parent (id,name)
>
> child1(id,join_key,field1)
>
> child2(id,join_key,field2)
>
>
> "join_key" is the foreign key in each of the child tables
>
> I would like to perform searches like the following: "give me the parents whose name
is x or whose child1.field1 is y or child2.field2 is z".
>
> I tried with a query like the following:
> #first attempt
> /parent/select?q=(name:x) OR ({!join from=join_key to=id fromIndex=child1}field1:y) OR
({!join from=join_key to=id fromIndex=child2}field2:z)
>
> but it didn't work.

Any idea why?

Try adding debugQuery=true to see if the query is parsed correctly.
Try executing a single join alone, like
q={!join from=join_key to=id fromIndex=child1}field1:y
to see if the results are as expected.

-Yonik
http://lucidworks.com
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message