lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2454) Nested Document query support
Date Sun, 27 Feb 2011 17:09:39 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000000#comment-13000000
] 

Paul Elschot commented on LUCENE-2454:
--------------------------------------

How about an implementation for strict hierarchies that uses two fields per document, in the
following way:

The two fields each contain a single (indexed) token that indicates the node in the nesting
hierarchy, one field meaning that the document is a child of that node, and the other that
the document is the representative of that node. Any number of levels could be allowed, but
no cycles of course.
These fields are then used by a merge policy to keep the documents ordered postorder, that
is the children immediately followed by the representative for each node.
Collecting scores at any node in the hierarchy could then be done by using term filters, one
for each involved scorer, to provide the representative for the current doc by advancing.


For example, in index order:

userDocId nodeMemberField nodeReprField

doc1 nodeA1 .
doc2 nodeA1 .
doc3 nodeA nodeA1
doc4 nodeA2 .
doc5 nodeA2 .
doc6 nodeA nodeA2

The node representatives for scoring could then be obtained by a term filter for nodeA.


I think this could work for the scoring part, basically along the lines of the code already
posted here.

Could someone with more experience in segment merge policies comment on this? This is quite
restrictive for merging as the only freedom that is left in the document order is the order
of the children for each node.

For example, adding a leaf document doc7 for nodeA1 could result in the following index order:

doc4 nodeA2 .
doc5 nodeA2 .
doc6 nodeA nodeA2
doc7 nodeA1 .
doc1 nodeA1 .
doc2 nodeA1 .
doc3 nodeA nodeA1




> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message