commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LANG-1380) FastDateParser too strict on abbreviated short month symbols
Date Mon, 26 Feb 2018 11:03:00 GMT

    [ https://issues.apache.org/jira/browse/LANG-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16376671#comment-16376671
] 

Markus Jelsma edited comment on LANG-1380 at 2/26/18 11:02 AM:
---------------------------------------------------------------

Hello Gary, Gilles, 

I was thinking to be more lenient not only for missing dots in some month forms, but also
sometimes optional or mandatory punctuation depending on Locale.

To give an example, we receive date formats from all over the web in the strangest forms.
It is easy for us to preprocess AM and PM (stripping punctuation or whitespace), or timezone
abbreviations, so it fits for the locale. But, it is not possible (or very hard) to preprocess
how some locale's treat their abbreviated literals, we can't add or strip dots without knowing
which month (or weekday) we are dealing with.

I did some work on FastDateParser.appendDisplayNames() to strip punctuation or append the
regex question mark to punctuation, but that broke things elsewhere, so that was clearly not
a good idea.



was (Author: markus17):
Hello Gilles, 

I was thinking to be more lenient not only for missing dots in some month forms, but also
sometimes optional or mandatory punctuation depending on Locale.

To give an example, we receive date formats from all over the web in the strangest forms.
It is easy for us to preprocess AM and PM (stripping punctuation or whitespace), or timezone
abbreviations, so it fits for the locale. But, it is not possible (or very hard) to preprocess
how some locale's treat their abbreviated literals, we can't add or strip dots without knowing
which month (or weekday) we are dealing with.

I did some work on FastDateParser.appendDisplayNames() to strip punctuation or append the
regex question mark to punctuation, but that broke things elsewhere, so that was clearly not
a good idea.


> FastDateParser too strict on abbreviated short month symbols
> ------------------------------------------------------------
>
>                 Key: LANG-1380
>                 URL: https://issues.apache.org/jira/browse/LANG-1380
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.time.*
>    Affects Versions: 3.7
>            Reporter: Markus Jelsma
>            Priority: Minor
>             Fix For: 3.8
>
>         Attachments: LANG-1380.patch
>
>
> The date format symbols of the French locale adds a . (dot) when short month names are
really abbreviated.
> {code}
> janv.
> févr.
> mars
> avr.
> mai
> juin
> juil.
> août
> sept.
> oct.
> nov.
> déc.
> {code}
> But in real world examples, the dot is frequently omitted.
> FastDateParser should be lenient in the case where the dot isn't there, e.g. "14 avr
2014".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message