any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bianca Pereira <>
Subject Extracting Blank Nodes instead of IRIs
Date Thu, 10 Jul 2014 11:50:33 GMT
Hi all,

  I started to use any23 recently and I had one issue extracting the
information from one website (

 I want to extract triples from the webpages and I faced the following

Even when there is an IRI that could be used as the identifier for a
concept it is not used and the blank node is used instead. In the following
example the actor Marco Nanini is represented by a blank node (
*_:nodec984d7c9ee5436ea92571ccd94b946*) even when he has an IRI that could
be used as the identifier (*file:/name/nm0620847/?ref_=tt_cl_t1*). After,
the blank node identification is used to link it with a Movie, which is
also identified by a blank node.

It seems that in this specific case I could use the content from the
property */Person/url* as the unique identifier (*IRI*) for the entity. I
suppose it is not a problem of the extractor but on how the page was
created. But as many people are using I was wondering if there
is any solution for this case. I would be very glad if someone has any idea
of a solution.

<file:index.html%3Fref_=fn_al_tt_4> <>
"Copacabana (2001) - IMDb" .
_:nodee59ff091c1fa911a94a42244c38ab99a <> <> .

*_:nodec984d7c9ee5436ea92571ccd94b946 <*
<>*> <*
** <>
*> . **_:nodec984d7c9ee5436ea92571ccd94b946 <*
** <>
*> "Marco Nanini" .**_:nodec984d7c9ee5436ea92571ccd94b946 <*
** <>
*> <file:/name/nm0620847/?ref_=tt_cl_t1> .
<*** <>*>
_:nodec984d7c9ee5436ea92571ccd94b946 .*

_:nodebf90e351418e786432aede35cceb807 <> <>
_:nodebf90e351418e786432aede35cceb807 <>
"Walderez de Barros" .
_:nodebf90e351418e786432aede35cceb807 <>
<file:/name/nm0207281/?ref_=tt_cl_t2> .
_:nodee59ff091c1fa911a94a42244c38ab99a <>
_:nodebf90e351418e786432aede35cceb807 .

Best Regards,

Bianca Pereira

View raw message