any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Potter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-238) Fix generation of BNode name for microdata when 'itemid' is given without a value.
Date Mon, 15 Sep 2014 10:27:34 GMT

    [ https://issues.apache.org/jira/browse/ANY23-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133771#comment-14133771
] 

Timothy Potter commented on ANY23-238:
--------------------------------------

Hi Lewis.
  The BNode ids will be of the same format.  The problem was that the MD5 of the string "0"
was being used in some cases where the source contained 'itemid=""'.  For example, in one
of our extractions this lead to over 140,000 type relations to the BNode _:nodecfcd208495d565ef66e7dff9f98764da
as 'cfcd208495d565ef66e7dff9f98764da' is the MD5 of "0".  I'm not objecting to the use of
an MD5 hash as the BNode id as long as it has an extremely low probability of collisions.
 In Any23 the MD5 is often generated directly on the Java hashcode, which when extracting
billions of tuples can lead to collisions.  Especially if there is a problem with the hashcode
implementation.

> Fix generation of BNode name for microdata when 'itemid' is given without a value.
> ----------------------------------------------------------------------------------
>
>                 Key: ANY23-238
>                 URL: https://issues.apache.org/jira/browse/ANY23-238
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: microdata
>    Affects Versions: 1.0
>            Reporter: Lewis John McGibbney
>             Fix For: 1.1
>
>
> Linking this issue to the relevant Github issue
> https://github.com/apache/any23/pull/9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message