lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation
Date Sat, 24 Mar 2012 15:30:25 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3911:
--------------------------------

    Attachment: LUCENE-3911_more.patch

trivial patch: forces us to pass minLength as well to randomRealistic so in that case we get
whole words in the same unicode block (good for stemmers), also sometimes uses randomRegexpIshString,
so we get lots of punctuation (good for tokenizers/filters, etc)
                
> improve BaseTokenStreamTestCase random string generation
> --------------------------------------------------------
>
>                 Key: LUCENE-3911
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3911
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: general/test
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3911.patch, LUCENE-3911.patch, LUCENE-3911_more.patch
>
>
> Most analysis tests use mocktokenizer (which splits on whitespace), but
> its rare that we generate a string with 'many tokens'. So I think we should
> try to generate more realistic test strings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message