lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Mapping Lucene search results with a relational database
Date Tue, 03 Jul 2012 10:34:56 GMT
Many considerations here - I find the technical concerns you present typically open a can of
worms for any businesses worried about security.
It gets political quickly.
In environments where security is paramount, software must be formally accredited, which is
a costly exercise.

Often the choice of database e.g. Oracle has been formally accredited but Lucene has not,
consequently all search results have to be run by the database (as in your example) for a
trusted judgement call.
Even in less demanding situations, it may just be that the Database team still want to cling
to some control over who sees what once they have delegated search responsibilities to an
external search engine.
In these scenarios it is often a mistake to have a naiive Lucene implementation which returns
many results, regardless of security rules, only to have many of them filtered out by the
In the worst case scenario the top million Lucene results may all be filtered out by the database
and the search has to be repeated for the top 2 million and so on until the desired number
of results are returned.

For these reasons it is advisable that Lucene searches should attempt to mirror the security
logic implemented by the database (e.g. rules like "doc must be from same dept as current
user" etc).
Businesses don't tend to like duplicating rules in this way or the latency involved in seeing
upstream security changes in database or security domain reflected in the search index.
Another consequence of duplicating the rules in Lucene is that while the solution as a whole
is accredited to yield no false positives (returning a document as safe when it should be
secured) there is a danger that Lucene could yield a false negative (an inconsistent Lucene
filter or doc denies access to a document that the database would have permitted). This may
be seen as equally bad.

The reality is however that for performance reasons this is the way things have to be and
various business stakeholders have to be convinced of this.
Not sure if this describes your scenario but it is one that I've encountered many times.


----- Original Message -----
From: Jochen Hebbrecht <>
Sent: Tuesday, 3 July 2012, 8:56
Subject: Mapping Lucene search results with a relational database

Hi all,

I have an application which holds a list of documents. These documents are
indexed using Lucene.
I can search on keywords of the documents. I loop the TopDocs and get the
ID field (of each Lucene doc) which is related to the ID column in my
relational database. From all these ID's, I create a list.
After building the list of ID's, I make a database query which is executing
the following SELECT statement (JPA):

SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)

This list of document is sent to the view (GUI).

But, some documents are private and should not be in the list. Therefore,
we have some extra statements in the SELECT query to do some security

SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
AND rule1 = foo
AND rule2 = bar

But now I'm wondering: I'm using the speed of Lucene to quickly search
documents, but I still have to do the SELECT query. So I'm loosing
performance on this one :-( ...
Does Lucene have some component which does this mapping for you? Or are
there any best practices on this issue? How do big projects map the Lucene
results to the relation database? Because the view should be rendering the

Many thanks!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message