commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [TEXT] Review of the code base
Date Sun, 19 Apr 2015 15:33:47 GMT
2015-04-19 17:29 GMT+02:00 Benedikt Ritter <britter@apache.org>:

>
>
> 2015-04-19 17:24 GMT+02:00 Benedikt Ritter <britter@apache.org>:
>
>> Hello Bruno,
>>
>> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita <brunodepaulak@yahoo.com.br
>> >:
>>
>>> Thanks for the thorough code review Benedikt! Comments inline.
>>>
>>> > 1. IP clearance! There are several comments talking about code adapted
>>> from
>>> >PHP libraries. This needs to be listed in NOTICE.txt. For an example see
>>> >the NOTICE.txt in [lang].
>>> Filed https://issues.apache.org/jira/browse/SANDBOX-497
>>> And first try to fix it:
>>>
>>>
>>> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497
>>> Does that look correct? I also checked with the author before porting it
>>> to the initial Java version, and told him about the [text] component -
>>> https://github.com/jasonpriem/HumanNameParser.php/issues/11
>>
>>
>> I've adjusted the changes a bit and merged it into master.
>>
>>
>>>
>>> > 2. The names package needs work.
>>> Completely agree. We can try to make it thread safe too. This will take
>>> a little longer.
>>> File new issue for tracking it
>>> https://issues.apache.org/jira/browse/SANDBOX-498
>>
>>
>> I've pushed a new branch containing a proposal how to fix this. Please
>> review.
>>
>>
>>>
>>>
>>> > 3. There are a some useless JavaDocs, which can be dropped. For
>>> example:
>>> > (...)> Better rename the variable to middleName and drop the comment.
>>> What about renaming the variables but leaving the comments?
>>>
>>
>> For I've corrected this for the names package in the SANDBOX-498 branch.
>>
>>
>>>
>>> > 4. Several classes have no tests. The overall test coverage looks
>>> good, so
>>> >this may not be a problem.
>>> Before the initial release I can start a development cycle just to add
>>> new tests for parts that seem important - like code with somewhat high [+5]
>>> cyclomatic complexity and uncovered branches. Just to make sure that the
>>> initial release will have an acceptable quality and less hidden bugs :-)
>>>
>>
>> Nice!
>>
>>
>>> > 5. I don't think the shade configuration is correct. To me it looks
>>> like
>>> >commons-text depends on [lang] and also shades some classes. Do we
>>> really
>>> >need the explicit dependency? Furthermore, why do we need to shade so
>>> many
>>> >classes? Why do we need anything beside StringUtils?
>>> My mistake. I copied the [functor] configuration without much thinking.
>>> IIRC, this dependency was added because of the names package. But it
>>> shouldn't be too complicate to replace it. It is mainly used for checking
>>> empty/blank values.
>>>
>>> So let's drop the [lang] dependency and use something like `$VAR != null
>>> && ! "".equals($VAR)` instead?
>>>
>>
>> I think we also need StringUtils in the similarity package. We should
>> handle this after we have merged the changes made to the names package.
>>
>>
>>> Thanks again for reviewing the code!
>>>
>>
>> No problem, thanks for pushing this forward.
>>
>> Here are some more things we need to take care of before 1.0:
>> - JIRA project
>>
>
> Requested a JIRA project:
> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477
>
>
>> - Rename similarity package to distance package? We have five Distance
>> implementations but only one Similarity. Furthermore we have FuzzyScore. Is
>> it a distance or a similarity? It should be renamed accordingly.
>> - ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>.
>> Does it make sense to introduce a new class called Vector?
>>
>
We also need a logo! [1] Anybody creative enough to do that? :-)

Benedikt

[1] https://issues.apache.org/jira/browse/SANDBOX-499


>
>> Benedikt
>>
>>
>>>
>>> All the best,Bruno
>>>
>>>
>>>       From: Benedikt Ritter <britter@apache.org>
>>>  To: Commons Developers List <dev@commons.apache.org>
>>>  Sent: Sunday, April 19, 2015 9:01 PM
>>>  Subject: [TEXT] Review of the code base
>>>
>>> Hi all,
>>>
>>> I've looked through the code base of [text] and I already did some clean
>>> up. However there are a few things, that need more work IMHO:
>>>
>>> 1. IP clearance! There are several comments talking about code adapted
>>> from
>>> PHP libraries. This needs to be listed in NOTICE.txt. For an example see
>>> the NOTICE.txt in [lang].
>>>
>>> 2. The names package needs work. Currently the HumanNameParser takes a
>>> Name
>>> object (which is just a wrapper around a String) as parameter. The parse
>>> result is then saved in fields (e.g. firstName). This makes the parser
>>> unusable after it has parsed a name. I think the API should be reworked
>>> such as:
>>> - The constructor of the parser takes configuration options which can be
>>> reused for several names to parse
>>> - the parse method takes a string as parameter, containing a name
>>> - the parse method returns an immutable Name objects which has getters
>>> for
>>> firstName, lastName etc.
>>>
>>> 3. There are a some useless JavaDocs, which can be dropped. For example:
>>> /**
>>>  * Middle name.
>>>  */
>>> private String middle;
>>>
>>> Better rename the variable to middleName and drop the comment.
>>>
>>> 4. Several classes have no tests. The overall test coverage looks good,
>>> so
>>> this may not be a problem.
>>>
>>> 5. I don't think the shade configuration is correct. To me it looks like
>>> commons-text depends on [lang] and also shades some classes. Do we really
>>> need the explicit dependency? Furthermore, why do we need to shade so
>>> many
>>> classes? Why do we need anything beside StringUtils?
>>>
>>> Nevertheless, this library looks very good. Kudos to Bruno for pushing
>>> this
>>> forward!
>>>
>>> Regards,
>>> Benedikt
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> http://people.apache.org/~britter/
>>> http://www.systemoutprintln.de/
>>> http://twitter.com/BenediktRitter
>>> http://github.com/britter
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> http://people.apache.org/~britter/
>> http://www.systemoutprintln.de/
>> http://twitter.com/BenediktRitter
>> http://github.com/britter
>>
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message