lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Best practice to map Lucene docids to real ids
Date Fri, 16 May 2014 10:24:55 GMT
On Tue, May 13, 2014 at 1:34 AM, Sven Teichmann <s.teichmann@s4ip.de> wrote:
> Hi,
>
> I also found this response very useful and right now I am playing around
> with DocValues.
>
>> If the default DocValuesFormat isn't fast enough, you can always
>> switch to e.g. DirectDocValuesFormat (uses lots of RAM but it just an
>> array lookup).
>
> How do I switch do DirectDocValuesFormat? And how do I receive the DocValues
> then?

Create your own subclass of Lucene46Codec, override the
"getDocValuesForField(String field)" method to return
DocValuesFormat.forName("Direct") for the fields you want to use
DirectDVFormat on.

Regardless of what DVFormat you use, retrieving them is always through
the same API (AR.getXXXDocValues()).  If you really want to retrieve
DocValues against a top-level reader, use MultiDocValues.getXXXValues,
but note that then there is a per-lookup binary search penalty, so
it's better to work directly with each SegmentReader if that
performance cost matters to you (which, given that you want to cutover
to DirectDVF, it must).

Mike McCandless

http://blog.mikemccandless.com



> Sven
>
> Am 07.05.2014 16:09, schrieb Wouter Heijke:
>
>> Hey Mike,
>>
>> That was a very useful response, also for long time Lucene users like
>> myself who were stuck in legacy ways of doing things!
>> I managed to easily change indexing of keys to DocValues and found myself
>> wondering why I did not get anything returned, it appears indexing works
>> transparent to any field, but to get your DocValue key out of the 'index'
>> you need to use the AtomicReader... all now works like it did before, only
>> faster (i hope ;-))
>>
>> Wouter
>>
>>> Doc values is far faster than a stored field.
>>>
>>> If the default DocValuesFormat isn't fast enough, you can always
>>> switch to e.g. DirectDocValuesFormat (uses lots of RAM but it just an
>>> array lookup).
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Tue, May 6, 2014 at 4:33 AM, Sven Teichmann <s.teichmann@s4ip.de>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> what is the best way to retrieve our "real" ids (as they are in our
>>>> database) while searching?
>>>>
>>>> Right now we generate a file after indexing which contains all Lucene
>>>> docids
>>>> and the matching id in our database. Our own Collector converts the
>>>> docids
>>>> to our ids while collecting. This works as long as no document is
>>>> deleted
>>>> and the index optimized after it.
>>>>
>>>> Is this a good solution or should we use Fields or DocValues for this?
>>>> What
>>>> is the fastest solution?
>>>>
>>>> Regards,
>>>>
>>>> Sven Teichmann
>>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message