lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <>
Subject [jira] Commented: (LUCENE-2454) Nested Document query support
Date Mon, 14 Jun 2010 15:37:17 GMT


Mark Harwood commented on LUCENE-2454:

Yep, I can see an app with a thousand cached filters would have a problem with this impl as
it stands. 

Maintaining parallel indexes always feels a little flaky to me, not least because of the loss
of  transactional integrity you can get from using a single index.

Is another approach to make your cached filters document-type-specific?   I.e. they only hold
numbers in the range of zero to number-of-docs-of-this-type.
To use a cached doc ID in such a filter you would need to make use of mapping arrays to project
the type-specific doc id numbers into global doc-id references and back.
Lets imagine an index with a mix of  "A", "B" and "C" doc types organised as follows:
docId    docType
=====  =======
1            A
2            B
3            C
4            A
5            C
6            C

The mapping arrays for docType "C" would look as follows
int [ ] globalDocIdToTypeCLookUp = {-1,-1,0,-1,1,2}        // sparse, sized 0-> num docs
in overall index
int [ ] typeCToGlobalDocIdLookUp = {0,1,2}          // dense, sized 0-> num type C docs
in overall index

Your cached filters would be created as follows:
myTypeCBitset=new OpenBitSet(numberOfTypeCDocs);  //this line is hopefully where you save
//for all matching type C docs...

Your filters can then be used by dereferencing the child doc IDs as follows:
int nextRealDocId=typeCToGlobalDocIdLookUp [myTypeCBitSet.getNextSetBit()];
Clearly the mapping arrays come at a cost of 4bytes*num docs which is non trivial. The sparse
globalDocIdToTypeCLookUp array shown here could be avoided by reading TermDocs and counting
at cached-Filter-create time .

> Nested Document query support
> -----------------------------
>                 Key: LUCENE-2454
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments:
> A facility for querying nested documents in a Lucene index as outlined in

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message