lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: Lucene.net Nested Documents support (lucene version 3.4)
Date Wed, 22 Aug 2012 07:12:02 GMT

Hi,

First of, storing this data into the index would mean that you would
store the permissions at index-time, not query-time. Any changed
permissions would require an reindexing of the documents affected.

You can accomplish this using payloads. I'm not sure on the technical
details regarding how they are read into memory, caching and such. I'm
using payloads for a small index (few thousand documents) to have a
timestamp on indexed values (a valid until-date) so documents no longer
matches a specific token after a set date. You could do something
similar where type- and permission information is encoded as a payload,
a byte-array, and verified at query time.

The score is calculated using a custom similarity, specified with
indexSearcher.SetSimilarity(new ValiditySimilarity());

     public class ValiditySimilarity : DefaultSimilarity {
         public override Single ScorePayload(Int32 docId, String
fieldName, Int32 start, Int32 end, Byte[] payload, Int32 offset, Int32
length) {
             var validTo = BitConverter.ToInt64(payload, offset);
             if (DateTime.Now.Ticks < validTo)
                 return 1;

             return 0;
         }
     }

The actual payload is generated by a custom token stream when indexing
the document.

     document.Add(new Field("FieldName", GetTokenStream("value1 value2",
DateTime.Now.AddDays(1))));

     private static TokenStream GetTokenStream(String value, DateTime
validTo) {
         var valueReader = new StringReader(value);
         var stream = new StandardTokenizer(V.LUCENE_29, valueReader);
         stream = new LowerCaseFilter(stream);
         stream = new ValidityPayloadFilter(stream, validTo);
         return stream;
     }

     public class ValidityPayloadFilter : TokenFilter {
         private readonly DateTime _validTo;
         private readonly PayloadAttribute _payloadAttribute;

         public ValidityPayloadFilter(TokenStream stream, DateTime validTo)
             : base(stream) {
             _validTo = validTo;
             _payloadAttribute =
(PayloadAttribute)AddAttribute(typeof(PayloadAttribute));
         }

         public override Boolean IncrementToken() {
             if (!input.IncrementToken())
                 return false;

             var bytes = BitConverter.GetBytes(_validTo.Ticks);

             var payload = new Payload(bytes);
             _payloadAttribute.SetPayload(payload);
             return true;
         }
     }

// Simon


On 2012-08-22 08:13, Omri Suissa wrote:
> Hi Simon,
> Thanks for the help.
> This is my scenario:
> My search application allow users to add manual tags to each document, each
> tag have a name, type and permissions.
> When searching I would like to have the following options:
> 1) get all the document that contains specific tag (with any type) that I
> have permission to view
> 2) get all the document that contains specific tag with specific type that
> I have permission to view
>
> For example if I have 2 documents:
> Doc A with tags:
>           X (type 1, permissions: everyone)
>           Y (type 1, permissions: User1, User2)
>           Z (type 2, permissions: User1)
>
> Doc B with tags:
>           X (type 2, permissions: everyone)
>           Y (type 4, permissions: everyone)
>           Z (type 2, permissions: User1)
>
> I'll be able to find A and B when searching for all documents with tag X,
> only A if X with type 1 and non of the if tag Z and i'm User2 (and so
> on...).
>
> So nested documents could really help me where each tag is a sub document
> (like sql JOIN operation).
>
> What can I do using the current capabilities?
>
> Thank you for the help,
> Omri
>
> On Tue, Aug 21, 2012 at 8:02 PM, Simon Svensson <sisve@devhost.se> wrote:
>
>> Hi,
>>
>> I do not have an answer to your explicit question, but this mail group
>> could perhaps help you with workarounds using the current functionality.
>> Are you after the search functionality (field1:a and field2:b) with child
>> documents? Or grouping of the results (the sql equivalent of group by)?
>> Return the first 5 entries of every group (like a Google search does per
>> site)?
>>
>> // Simon
>>
>>
>> On 2012-08-21 16:00, Omri Suissa wrote:
>>
>>> Hi everyone,
>>> We are currently implementing Lucene .net in our solution and we need to
>>> use the Lucene Nested Documents support that was introduce in Lucene
>>> version 3.4
>>> If I understand correctly the current version of Lucene .net does not
>>> support this feature (and other 3.4 features), there is a timeline for the
>>> 3.4 porting to .net?
>>>
>>> Thank you,
>>> Omri
>>>
>>>





Mime
View raw message