lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream
Date Mon, 19 Dec 2011 20:06:50 GMT
On Mon, Dec 19, 2011 at 9:04 PM, Simon Willnauer
<simon.willnauer@googlemail.com> wrote:
> On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich <peathal@yahoo.de> wrote:
>> Hi Uwe,
>>
>> thanks for the talk suggestion(s)*.
>>
>> I was using it for faster term lookups of a long 'id'. How would this be
>> done with 4.0? Before I did it via Term:
>>
>> new Term(fieldName, NumericUtils.longToPrefixCoded(longValue));
>>
>> How should I generally do "term lookup" in 4.0 as you said in the video
>> that 'Term' gets removed somewhen :)? What is the most recommended way
>> and what is the fastest? Or where can I find "most recent" code in
>> lucene tests to be used as an example?
>>
>> I also heard the suggestion to use the pulsing codec for id retrieval**.
>> Is this the correct way nowadays to achive this:
>>
>> indexWriterCfg.setCodec(new Lucene40Codec() {
>>   @Override public PostingsFormat getPostingsFormatForField(String field) {
>>       if("_id".equals(field)) return new Pulsing40PostingsFormat();
>>       else ?
>>   }});
>
> do something like this:
>
>  public static final class CustomPerFieldCodec extends Lucene40Codec {
>    private final PostingsFormat pulsing = PostingsFormat.forName("Pulsing40");
>    private final PostingsFormat defaultFormat =
> PostingsFormat.forName("Lucene40");
>
>    @Override
>    public PostingsFormat getPostingsFormatForField(String field) {
>      if (field.equals("id")) {
>        return pulsing;
>      } else {
>        return defaultFormat;
>      }
>    }
>  }
>
> simon

Actually, if you look for fast ID lookups you could consider using
Memory PostingsFormat. This keeps everything in memory and should be
the fastest alternative but costly in terms of RAM.

private final PostingsFormat memory = PostingsFormat.forName("Memory");

simon

>>
>> Regards,
>> Peter.
>>
>> *
>> http://vimeo.com/32065505
>>
>> **
>> http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html
>>
>>
>>> Hi,
>>>
>>> NumericUtils is an internal implementation class, you should not use it.
>>> What do you want to do? There is no need to call any of its methods during
>>> indexing or searching. Everything else is advanced. I the latter case you
>>> should RTFM of BytesRef and realted classes (possibly watch the flexible
>>> indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene
>>> moved to binary terms in 4.0 and no longer uses character based terms, so
>>> the code is different. BytesRef is just a wrapper around a byte[].
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message