lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Combining hits from multiple documents into a single hit
Date Thu, 17 Sep 2009 22:47:59 GMT

Assuming i understand you correctly, then...
  1. properties only exist as part of a single article (no articles share 
a complex property) 
  2. you don't have any need to ever return searchese on 
properties, they exist just to add in searching for articles.

IF that's correct, then the idea i would try is to only index 1 document 
per article, with all of the text included, and use payloads to annotate 
which text is securred by which property.  Then use SpanQueries to search 
for your docs, and in a custom HitCollector check the matching spans for 
each doc to get the corrisponding property, and test that 
<user/doc/property> tripple agaisnt your security mechanism -- if any 
fail, skip that doc.

It's not something i've ever tried (or thought through very hard) but 
based on other comments i've seen from people about payloads it sounds 
like it should work pretty well and give you decent scores.

: [I originally posted this to the mailing list,but it was suggested
: that I might have more luck here]
: I am trying to get a particular search to work and it is proving problematic.
: The actual source data is quite complex but can be summarised by the following
: example:
: I have articles that are indexed so that they can be searched. Each article
: also has multiple properties associated with it which are also indexed and
: searchable. When users search, they can get hits in either the main article or
: the associated properties. Regardless of where a hit is achieved, the article
: is returned as a search hit (ie. the properties are never a hit in their own
: right).
: Now for the complexity:
: Each property has security on it, which means that for any given user, they
: may or may not be able to see the property. If a user cannot see a property,
: they obviously do not get a search hit in it. This security check is
: proprietary and cannot be done using the typical mechanism of storing a role
: in the index alongside the other fields in the document.
: I currently have a index that contains the articles and properties indexed
: separately (ie. an article is indexed as a document, and each property has its
: own document). When a search happens, a hit in article A or a hit in any of
: the properties of article A should be classed as hit for article A alone, with
: the scores combined.
: Whether or not a user can see a property is not based on the property itself,
: but on the value of the property. I cannot therefore put the extra security
: conditions into the query upfront as I don't know the value to filter by.
: As an example:
: +---------+------------+------------+
: | Article | Property 1 | Property 2 |
: +---------+------------+------------+
: |    A    |     X      |     J      |
: |    B    |     Y      |     K      |
: |    C    |     Z      |     L      |
: +---------+------------+------------+
: If a user can see everything, then searching for "B and Y" will return a
: single search result for article B.
: If another user cannot see a property if its value contains Y, then searching
: for "B and Y" will return no hits.
: I have no way of knowing what values a user can and cannot see upfront. They
: only way to tell is to perform the security check (currently done at the time
: of filtering a hit from a field in the document), which I obviously cannot do
: for every possible data value for each user.
: To achieve this originally, Lucene v1.3 was modified to allow this to happen
: by changing BooleanQuery to have a custom Scorer that could apply the logic of
: the security check and the combination of two hits in different documents
: being classed as a hit in a single document. I am trying to upgrade this
: version to the latest (v2.3.2 - I am using Lucene.Net), but ideally without
: having to modify Lucene in any way.
: An additional problem occurs if I do an AND search. If an article contains the
: word foo and one of its properties contains the word bar, then searching for
: "foo AND bar" will return the article as a hit. My current code deals with
: this inside the custom Scorer.
: Any ideas how/if this can be done?
: I am thinking along the lines of using a custom HitCollector and passing that
: into the search, but when doing the boolean search "foo AND bar", execution
: never reaches my HitCollector as the ConjunctionScorer filters out all of the
: results from the sub-queries before getting there.
: Thanks,
: Adrian
: ---------------------------------------------------------------------
: To unsubscribe, e-mail:
: For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message