lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omri Suissa <omri.sui...@diffdoof.com>
Subject Re: Lucene.net Nested Documents support (lucene version 3.4)
Date Wed, 22 Aug 2012 08:23:20 GMT
Thank you,
I can understand how this can work on a small number of documents but what
if i have millions of documents?
then there could be a situation when a lot of documents will be returned by
the query and only then we will set the score to 0.
I would like to find a way that the documents will not be return from the
query in the first place... (as far as i understand this way it will be
much more efficient).

Omri

On Wed, Aug 22, 2012 at 10:12 AM, Simon Svensson <sisve@devhost.se> wrote:

>
> Hi,
>
> First of, storing this data into the index would mean that you would
> store the permissions at index-time, not query-time. Any changed
> permissions would require an reindexing of the documents affected.
>
> You can accomplish this using payloads. I'm not sure on the technical
> details regarding how they are read into memory, caching and such. I'm
> using payloads for a small index (few thousand documents) to have a
> timestamp on indexed values (a valid until-date) so documents no longer
> matches a specific token after a set date. You could do something
> similar where type- and permission information is encoded as a payload,
> a byte-array, and verified at query time.
>
> The score is calculated using a custom similarity, specified with
> indexSearcher.SetSimilarity(**new ValiditySimilarity());
>
>     public class ValiditySimilarity : DefaultSimilarity {
>         public override Single ScorePayload(Int32 docId, String
> fieldName, Int32 start, Int32 end, Byte[] payload, Int32 offset, Int32
> length) {
>             var validTo = BitConverter.ToInt64(payload, offset);
>             if (DateTime.Now.Ticks < validTo)
>                 return 1;
>
>             return 0;
>         }
>     }
>
> The actual payload is generated by a custom token stream when indexing
> the document.
>
>     document.Add(new Field("FieldName", GetTokenStream("value1 value2",
> DateTime.Now.AddDays(1))));
>
>     private static TokenStream GetTokenStream(String value, DateTime
> validTo) {
>         var valueReader = new StringReader(value);
>         var stream = new StandardTokenizer(V.LUCENE_29, valueReader);
>         stream = new LowerCaseFilter(stream);
>         stream = new ValidityPayloadFilter(stream, validTo);
>         return stream;
>     }
>
>     public class ValidityPayloadFilter : TokenFilter {
>         private readonly DateTime _validTo;
>         private readonly PayloadAttribute _payloadAttribute;
>
>         public ValidityPayloadFilter(**TokenStream stream, DateTime
> validTo)
>             : base(stream) {
>             _validTo = validTo;
>             _payloadAttribute =
> (PayloadAttribute)**AddAttribute(typeof(**PayloadAttribute));
>         }
>
>         public override Boolean IncrementToken() {
>             if (!input.IncrementToken())
>                 return false;
>
>             var bytes = BitConverter.GetBytes(_**validTo.Ticks);
>
>             var payload = new Payload(bytes);
>             _payloadAttribute.SetPayload(**payload);
>             return true;
>         }
>     }
>
> // Simon
>
>
>
> On 2012-08-22 08:13, Omri Suissa wrote:
>
>> Hi Simon,
>> Thanks for the help.
>> This is my scenario:
>> My search application allow users to add manual tags to each document,
>> each
>> tag have a name, type and permissions.
>> When searching I would like to have the following options:
>> 1) get all the document that contains specific tag (with any type) that I
>> have permission to view
>> 2) get all the document that contains specific tag with specific type that
>> I have permission to view
>>
>> For example if I have 2 documents:
>> Doc A with tags:
>>           X (type 1, permissions: everyone)
>>           Y (type 1, permissions: User1, User2)
>>           Z (type 2, permissions: User1)
>>
>> Doc B with tags:
>>           X (type 2, permissions: everyone)
>>           Y (type 4, permissions: everyone)
>>           Z (type 2, permissions: User1)
>>
>> I'll be able to find A and B when searching for all documents with tag X,
>> only A if X with type 1 and non of the if tag Z and i'm User2 (and so
>> on...).
>>
>> So nested documents could really help me where each tag is a sub document
>> (like sql JOIN operation).
>>
>> What can I do using the current capabilities?
>>
>> Thank you for the help,
>> Omri
>>
>> On Tue, Aug 21, 2012 at 8:02 PM, Simon Svensson <sisve@devhost.se> wrote:
>>
>>  Hi,
>>>
>>> I do not have an answer to your explicit question, but this mail group
>>> could perhaps help you with workarounds using the current functionality.
>>> Are you after the search functionality (field1:a and field2:b) with child
>>> documents? Or grouping of the results (the sql equivalent of group by)?
>>> Return the first 5 entries of every group (like a Google search does per
>>> site)?
>>>
>>> // Simon
>>>
>>>
>>> On 2012-08-21 16:00, Omri Suissa wrote:
>>>
>>>  Hi everyone,
>>>> We are currently implementing Lucene .net in our solution and we need to
>>>> use the Lucene Nested Documents support that was introduce in Lucene
>>>> version 3.4
>>>> If I understand correctly the current version of Lucene .net does not
>>>> support this feature (and other 3.4 features), there is a timeline for
>>>> the
>>>> 3.4 porting to .net?
>>>>
>>>> Thank you,
>>>> Omri
>>>>
>>>>
>>>>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message