stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Entityhub : Can't retrieve entity with a #
Date Mon, 13 Jun 2011 16:23:20 GMT
Hi

> But in fact with this configuration, in Felix configuration "Apache Stanbol
> Entityhub Referenced Site Configuration", entity prefixes are set by default
> to :
> - http://dbpedia.org/resource/
> - http://dbpedia.org/ontology/
>
> So IMO, there may be a bug in the code, or the comment may be change.

Actually while debugging the the '#' thing I discovered that Felix
uses the value defined in the "value" attribute of the "@Property"
annotation as default, even if someone directly uses the
ConfigAdminSerivice to create a component instance. Previously I was
thinking that this values are only used by the Apache Felix Web
Console however such annotations are also used if a property is not
defined in a configuration directly parsed to the configuration admin.

Because of this I will delete all the default values currently used in
the source code and add the current values as Example to the
description of the fields.


> 1) There is a typo error in mapping.txt :
> ==> change
> # copy dc:titel to rdfs:label
> dc:titel > rdfs:label
> ==> to
> # copy dc:title to rdfs:label                           dc:title >
> rdfs:label

Thx. Also added a "dc-elements:title > rdfs:label" mapping

> 2) In the Felix console, when try to modify the "Apache Stanbol Entityhub
> Referenced Site Configuration" of an imported index.
> There is an ajax error on save :
> The request failed:
> [object XMLDocument]

Is there also a Exception in the log?

best
Rupert Westenthaler

On Mon, Jun 13, 2011 at 5:42 PM, Florent André <florent@apache.org> wrote:
> Hi Rupert,
>
> Thanks for testing it on your side.
>
> I invest and compare iptc configuration VS mine and found the problem !
>
> This come from this line in indexing.properties :
> # the entity prefixes are used to determine if an entity needs to be
> searched
> # on a referenced site. If not specified requests for any entity will be
> # forwarded to this referenced site.
> # use ';' to seperate multiple values
> #org.apache.stanbol.entityhub.site.entityPrefix=http://example.org/resource;urn:mycompany:
>
> Reading this comment, I first leave it commented (not specify an
> entityPrefix), because reading the comment I understand that in any case,
> all requests go to it... and that's fine ! :)
>
> But in fact with this configuration, in Felix configuration "Apache Stanbol
> Entityhub Referenced Site Configuration", entity prefixes are set by default
> to :
> - http://dbpedia.org/resource/
> - http://dbpedia.org/ontology/
>
> So IMO, there may be a bug in the code, or the comment may be change.
>
> During this investigation I also "discover" theses (not closely related to
> this problem) :
>
> 1) There is a typo error in mapping.txt :
> ==> change
> # copy dc:titel to rdfs:label
> dc:titel > rdfs:label
> ==> to
> # copy dc:title to rdfs:label                           dc:title >
> rdfs:label
>
> 2) In the Felix console, when try to modify the "Apache Stanbol Entityhub
> Referenced Site Configuration" of an imported index.
> There is an ajax error on save :
> The request failed:
> [object XMLDocument]
>
> (I see this when try to modify entity prefixes of my imported index).
>
> Please ask if you prefer to have Jira tickets for this issues (if they are
> really ones).
>
> Thanks for you help.
> ++
>
>
> On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
>>
>> Hi florent
>>
>> Using a '#' in the URI has the disadvantages, that browsers will not
>> send the part behind the hash to the server because they assume, that
>> they need to download the whole document and navigate to the anchor
>> within the document.
>>
>> Using curl (or javascript) I think the full URL should be sent to the
>> server (was not able to find some good information about this, but at
>> least "curl -v" says that it sends the whole URL to the server).
>> However on the server side Jersey does also not provide the #{anchor}
>> part of the URL.
>> Sending
>>>
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>
>> will parse only "http://www.test.fr/terminology" to a method annotated
>> with
>>
>>     @GET
>>     @Path("/entity")
>>     public Response getEntity(@QueryParam(value = "id") String id) {
>>         // get the Entity
>>         ...
>>
>> URL encoding the '#' to '%23' causes Jersey to parse
>> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>>
>> In this case the query for an entity with this ID is correctly parsed
>> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
>> indexed Entity uses '#' it should work as long as Entities are cached
>> locally. If a remote service is used, than the same problem of the '#'
>> reappears for the remote service.
>>
>> To test on my side I have done the following:
>> * renamed the Entities of the IPTC worldregions from
>> "http://cv.iptc.org/newscodes/worldregion/r001" to
>> "http://cv.iptc.org/newscodes/worldregion#r001"
>> * indexed the IPTC using the indexing tools
>> * installed the index to the entityhub
>> * curl -v
>> "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>>
>> Assuming that
>>>
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>> happend on a referenced site with a full cache (e.g. as created by the
>> Indexing Utility. I was not able to reproduce the Error. If the
>> referenced site uses a remote service to dereferenced entity ids (e.g.
>> the Cool URI) this might happen. In this case I suggest to directly
>> test the remote service.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
>> <florent.andre-dev@4sengines.com>  wrote:
>>>
>>> Hi Rupert,
>>> Hope you are fine.
>>>
>>> I have another problem...
>>> In my skos, entity are identify by an #, like this :
>>>
>>>  <rdf:Description
>>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>>    <skos:broader
>>>
>>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>>    <skos:prefLabel>GRADIENT</skos:prefLabel>
>>>    <skos:inScheme
>>>
>>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>>    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>>  </rdf:Description>
>>>
>>> And I can't arrive to find the entity with the entity endpoint.
>>>
>>> * With the # char :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>> ==>  answer is
>>> Entity with ID 'http://www.test.fr/terminology' not found an any
>>> referenced
>>> site
>>>
>>> ==>  the part after the # is remove
>>>
>>> * With replacement of the # by %23 (the urlencode equivalent) :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>>> ==>  all the id is keep, but still not found...
>>> The result is the same if I urlencode all the entity id.
>>>
>>> This is related to a bug or something I do wrong ?
>>>
>>> Thanks.
>>> ++
>>>
>>>
>>
>>
>>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message