lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2454) Nested Document query support
Date Mon, 06 Jun 2011 13:06:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044828#comment-13044828
] 

Mark Harwood commented on LUCENE-2454:
--------------------------------------

Below are 2 example tests searching employment resumes - both using the same optional and
mandatory clauses but in subtly different ways.
Question 1 is "who has Mahout skills and preferably used them at Lucid?" while the other question
is "who has Mahout skills and preferably has been employed by Lucid?". The questions and the
answers are different. Below is the XML test script I used to illustrate the data/queries
used, define expected results and run as an executable test. 
Hopefully you can make sense of this:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<Test description="NestedQuery tests">
	<Data>
		<Index name="ResumeIndex">
			<Analyzers class="org.apache.lucene.analysis.WhitespaceAnalyzer">
			</Analyzers>
			<Shard name="shard1">
				<!--  =============================================================== -->
				<Document pk="1">
					<Field name="name">grant</Field>
					<Field name="docType">resume</Field>
				</Document>
				<!--  =============================================================== -->
						<Document pk="2">
							<Field name="employer">lucid</Field>
							<Field name="docType">employment</Field>
							<Field name="skills">java lucene</Field>
						</Document>
				<!--  =============================================================== -->
						<Document pk="3">
							<Field name="employer">somewhere else</Field>
							<Field name="docType">employment</Field>
							<Field name="skills">mahout and more mahout</Field>
						</Document>
				<!--  =============================================================== -->
				<Document pk="4">
					<Field name="name">sean</Field>
					<Field name="docType">resume</Field>
				</Document>
				<!--  =============================================================== -->
						<Document pk="5">
							<Field name="employer">foo bar</Field>
							<Field name="docType">employment</Field>
							<Field name="skills">java</Field>
						</Document>
				<!--  =============================================================== -->
						<Document pk="6">
							<Field name="employer">some co</Field>
							<Field name="docType">employment</Field>
							<Field name="skills">mahout mahout and more mahout</Field>
						</Document>
			</Shard>
		</Index>
	</Data>
	<Tests>
		<Test description="Who knows Mahout and preferably used it *while employed at Lucid*?">
			<Query>
	            <NestedQuery> 
	            	<!-- testing properties of individual child employment docs -->
	               <Query>
	                  <BooleanQuery>
	                  		<Clause occurs="must">
	                  			<TermsQuery fieldName="skills">mahout</TermsQuery>
	                  		</Clause>
	                  		<Clause occurs="should">
	                  			<TermsQuery fieldName="employer">lucid</TermsQuery>
	                  		</Clause>
	                  </BooleanQuery>
	               </Query>
	               <ParentsFilter>	
	                    <TermsFilter fieldName="docType">resume</TermsFilter>   
              		 
	               </ParentsFilter>	
	            </NestedQuery>
			</Query>
			<ExpectedResults why="Grant's tenure at Lucid is overlooked for scoring purposes 
			                       because it did not involve the required Mahout. Sean has more Mahout
experience">
							<Result fieldName="pk">4</Result>
							<Result fieldName="pk">1</Result>
			</ExpectedResults>
		</Test>

		<!-- ====================================================================================
-->
		
		<Test description="Different question - who knows Mahout and preferably has been employed
by Lucid?">
			<Query>
                <BooleanQuery>
                  		<Clause occurs="must">
				            <NestedQuery> 
				            	<!-- testing properties of one child employment docs -->
				               <Query>
				                  	<TermsQuery fieldName="skills">mahout</TermsQuery>
				               </Query>
				               <ParentsFilter>	
				                    <TermsFilter fieldName="docType">resume</TermsFilter>
                 		 
				               </ParentsFilter>	
				            </NestedQuery>
                  		</Clause>
                  		<Clause occurs="should">
				            	<!-- Another NestedQuery testing properties of *potentially different*
child employment docs -->
				            <NestedQuery> 
				               <Query>
		                  			<TermsQuery fieldName="employer">lucid</TermsQuery>
				               </Query>
				               <ParentsFilter>	
				                    <TermsFilter fieldName="docType">resume</TermsFilter>
                 		 
				               </ParentsFilter>	
				            </NestedQuery>
                  		</Clause>
                  	</BooleanQuery>
			</Query>
			<ExpectedResults why="Grant has the required Mahout skills plus the optional Lucid engagement">
							<Result fieldName="pk">1</Result>
							<Result fieldName="pk">4</Result>
			</ExpectedResults>
		</Test>
		<!-- ====================================================================================
-->
	</Tests>
</Test>
{code}	

> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message