lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2454) Nested Document query support
Date Mon, 14 Jun 2010 15:37:17 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878617#action_12878617
] 

Mark Harwood commented on LUCENE-2454:
--------------------------------------

Yep, I can see an app with a thousand cached filters would have a problem with this impl as
it stands. 

Maintaining parallel indexes always feels a little flaky to me, not least because of the loss
of  transactional integrity you can get from using a single index.

Is another approach to make your cached filters document-type-specific?   I.e. they only hold
numbers in the range of zero to number-of-docs-of-this-type.
To use a cached doc ID in such a filter you would need to make use of mapping arrays to project
the type-specific doc id numbers into global doc-id references and back.
Lets imagine an index with a mix of  "A", "B" and "C" doc types organised as follows:
docId    docType
=====  =======
1            A
2            B
3            C
4            A
5            C
6            C

The mapping arrays for docType "C" would look as follows
{code:title=Bar.java|borderStyle=solid}
int [ ] globalDocIdToTypeCLookUp = {-1,-1,0,-1,1,2}        // sparse, sized 0-> num docs
in overall index
int [ ] typeCToGlobalDocIdLookUp = {0,1,2}          // dense, sized 0-> num type C docs
in overall index
{code}

Your cached filters would be created as follows:
{code:title=Bar.java|borderStyle=solid}
myTypeCBitset=new OpenBitSet(numberOfTypeCDocs);  //this line is hopefully where you save
RAM!
//for all matching type C docs...
myTypeCBitSet.setBit(globalDocIdToTypeCLookUp[realDocId];
{code}

Your filters can then be used by dereferencing the child doc IDs as follows:
{code:title=Bar.java|borderStyle=solid}
int nextRealDocId=typeCToGlobalDocIdLookUp [myTypeCBitSet.getNextSetBit()];
{code}
  
Clearly the mapping arrays come at a cost of 4bytes*num docs which is non trivial. The sparse
globalDocIdToTypeCLookUp array shown here could be avoided by reading TermDocs and counting
at cached-Filter-create time .


> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LuceneNestedDocumentSupport-1.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message