commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henri Yandell (JIRA)" <>
Subject [jira] Commented: (LANG-288) StrTokenizer needs to support access to the token separators
Date Mon, 08 Feb 2010 06:49:28 GMT


Henri Yandell commented on LANG-288:

Both :)  Multiple delimiter tokenizer is supported by using a CharSetMatcher if I understand

I think the iniital issue is that the Matcher API will need to return the item that was matched
against instead of the number of characters matched. Essentially the same API, until you put
a RegexpMatcher in there or some other ruleset.

Once that is done, then StrTokenizer would have access to the delimiter that actually matched.

Big question is whether that API change to StrMatcher is 'good'. 

> StrTokenizer needs to support access to the token separators
> ------------------------------------------------------------
>                 Key: LANG-288
>                 URL:
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.*
>            Reporter: Stephen Colebourne
>            Priority: Minor
>             Fix For: 3.1
> With StrTokenizer at present you cannot extract the separators between the tokens, a
feature which is possible with StringTokenizer.
> Thus tokenizing "a.b@c.d" using ".@" would return a,b,c,d but you wouldn't know where
the @ was.
> This could probably best be part of the API as a lastSeparator() method that can only
be called after next(), returning the separator(s) between that token and the previous token.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message