commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno P. Kinoshita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SANDBOX-487) Human name parser
Date Sun, 04 Jan 2015 23:27:34 GMT

    [ https://issues.apache.org/jira/browse/SANDBOX-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264083#comment-14264083
] 

Bruno P. Kinoshita commented on SANDBOX-487:
--------------------------------------------

Hi Sebb

> There are some people who have only a single name. This seems to be true of at least
some Buddhists.

Good point. IIRC the library throws an Exception if no last name is found.

About one year ago I attended a talk on data management in Brazil, and the speaker told us
about a limitation in their systems that prevented a user of renting a car. Looks like someone
was legally registered as "Felipe" IIRC. That is his full name on his document. No idea why
(usually orphans receive "Silva" or "da Silva" in Brazil).

Also, Teller from Penn & Teller [is mononymous|http://en.wikipedia.org/wiki/Teller_(magician)]
is mononymous (he changed his name). So we probably need to remove that restriction from the
code.

> "Given name(s)" and "Family name" are perhaps better than "first" and "last", but may
not be entirely universal.

That's another good point. I wonder if something like [Wiki on personal name structure|http://en.wikipedia.org/wiki/Personal_name#Structure]
or maybe some standard could be used as reference for deciding which option to use.

Using the existing code in GitHub, and the Pedro II of Brazil's full name in the example below

{noformat}
String name = "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier
de Paula Leocádio Miguel Gabriel Rafael Gonzaga";
HumanNameParserParser parser = new HumanNameParserParser(name);
String first = parser.getFirst();
String middle = parser.getMiddle();
String last = parser.getLast();

System.out.println("First: " + first);
System.out.println("Middle: " + middle);
System.out.println("Last: " + last);
{noformat}

we get

{noformat}
First: Pedro
Middle: de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio
Miguel Gabriel Rafael
Last: Gonzaga
{noformat}

If we changed to Given name and Family name, would the output be similar to the one below?

{noformat}
Given Names: Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de
Paula Leocádio Miguel Gabriel Rafael
Last: Gonzaga
{noformat}

If so, perhaps we could keep the existing behaviour and simply add given names?

> Human name parser
> -----------------
>
>                 Key: SANDBOX-487
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-487
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Commons Text
>            Reporter: Bruno P. Kinoshita
>            Priority: Minor
>              Labels: name, parser, text
>
> The project [HumanNameParser.java|http://tupilabs.github.io/HumanNameParser.java/] is
a port to Java of the [HumanNameParser.php|http://jasonpriem.org/human-name-parse/], both
licensed under the MIT License. 
> This issue was created to discuss a similar parser, based on the Java version, to the
[text] component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message