lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <>
Subject [jira] Commented: (LUCENE-1227) NGramTokenizer to handle more than 1024 chars
Date Wed, 14 May 2008 06:07:55 GMT


Otis Gospodnetic commented on LUCENE-1227:

Thanks for the test and for addressing this!

Could you add some examples for NO_OPTIMIZE and QUERY_OPTIMIZE?  I can't tell from looking
at the patch what those are about.  Also, note how existing variables use camelCaseLikeThis.
 It would be good to stick to the same pattern (instead of bufflen, buffpos, etc.), as well
as to the existing style (e.g. space between if and open paren, spaces around == and =, etc.)

I'll commit as soon as you make these changes, assuming you can make them.  Thank you.

> NGramTokenizer to handle more than 1024 chars
> ---------------------------------------------
>                 Key: LUCENE-1227
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Hiroaki Kawai
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-1227.patch, NGramTokenizer.patch, NGramTokenizer.patch
> Current NGramTokenizer can't handle character stream that is longer than 1024. This is
too short for non-whitespace-separated languages.
> I created a patch for this issues.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message