lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject New Token API was Re: Payloads and TrieRangeQuery
Date Sun, 14 Jun 2009 12:17:10 GMT
Agreed.  I've been bringing it up for a while now and made the same  
comments when it was first introduced, but felt like the lone voice in  
the wilderness on it and gave way [1], [2], [3].  Now that others are  
writing/converting, I think it is worth revisiting.

That being said, I did just write my first TokenFilter with it, and  
didn't think it was that hard.  There are some gains in it and the API  
can be simpler if you just need one or two attributes (see  
DelimitedPayloadTokenFilter), although, just like the move to using  
char [] in Token, as soon as you do something like store a Token, you  
lose most of the benefit, I think (for the char [] case, as soon as  
you need a String in one of your filters, you lose the perf. gain).   
The annoying parts are that you still have to implement the deprecated  
next() part, otherwise chances are the thing is unusable by everyone  
at this point anyway.

Add on top of it, that the whole point of customizing the chain is to  
use it in search and, frankly speaking, somehow I think that part of  
the patch was held back.

I personally would vote for reverting until a complete patch that  
addresses both sides of the problem is submitted and a better solution  
to cloning is put forth.

-Grant

[1] http://issues.apache.org/jira/browse/LUCENE-1422,
[2] http://www.lucidimagination.com/search/document/5daf6d7b8027b4d3/tokenstream_and_token_apis#9e2d0d2b5dc118d4

, and the rest of the discussion on that thread.
[3] http://www.lucidimagination.com/search/document/4274335abcf31926/new_tokenstream_api_usage

On Jun 13, 2009, at 10:32 PM, Mark Miller wrote:

> Yonik Seeley wrote:
>> Even non-API changes have tradeoffs... the indexing improvements  
>> (err,
>> total rewrite) made that code *much* harder to understand and debug.
>> It's a net win since the indexing performance improvements were so
>> fantastic.
>>
> I agree - very hard to follow, worth the improvements.
>
> Just to throw something out, the new Token API is not very  
> consumable in my experience. The old one was very intuitive and very  
> easy to follow the code.
>
> I've had to refigure out what the heck was going on with the new one  
> more than once now. Writing some example code with it is hard to  
> follow or justify to a new user.
>
> What was the big improvement with it again? Advanced, expert custom  
> indexing chains require less casting or something right?
>
> I dunno - anyone else have any thoughts now that the new API has  
> been in circulation for some time?
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message