jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross.Dy...@ipaustralia.gov.au
Subject Observations of SQL-2 queries [SEC=UNCLASSIFIED]
Date Thu, 01 Apr 2010 03:53:07 GMT
I have an archive of PDF documents, with metadata.  Mostly I will find the 
documents by traversing the tree, but sometimes I will need to get them by 
their unique identifier (from the source system) which I use for the 
file's node name and as a property on the jcr_content subnode.

Here is my data structure:

<folder>/<folder>/<nt:file name = 282675>/<ed:docreftype 
name=jcr:content><property name=jcr:data value=binary>
 <property name=ed:document_id value=282675>
 <..other properties..>

about 100K records, no folder contains > 99 nodes

select * from [nt:file] as doc where doc.name = '282675'
takes 30 seconds finds nothing

select * from [nt:file] as doc inner join [ed:docreftype] as content on 
ischildnode(content, doc)
where content.[ed:document_id] = '282675'
takes 50 seconds finds the right record

select * from [nt:file] as doc inner join [ed:docreftype] as content on 
ischildnode(content, doc) 
where contains(content.[ed:document_id], '282675') and 
content.[ed:document_id] = '282675'>
takes 2 seconds finds the correct record.

So getting the Lucene index to do a first cut of the results helps the 
whole process.

Prophecy:
He who pulls the mighty sword ExQueryString from the stone JSR-283 shall 
be the rightful king of all Content.

Mime
View raw message