stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Unexpected results for FieldQuery (was: Fwd: [jira] [Commented] (STANBOL-187) Extendable indexing infrastructure for the Entityhub)
Date Fri, 03 Jun 2011 13:55:31 GMT
Hi Florent

First of all: as I expected it was a Bug in the FieldQuery
implementation of the SolrYard. The Bug is fixed in the meantime [1],
however the Bug itself was not the reason why you where getting
unexpected results. That there where any results for the second query
("South America") was a result of the Bug.

The reason why you do not get any results is because you search for
"xsd:string" values, but "skos:prefLabel" are mapped to
"entityhub:text". The entityhub distinguishes between natural language
text and "string" values. Strings would be typically used for ids
(e.g. ISBN numbers). also "skos:notation" would be a good example. The
Solr yard does not use any tokenizer for String values.

Your query used a ValueConstraint to search for the skos:prefLabel:
>   {
>       "type": "value",
>       "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
>       "value": "Africa",
>   }
if no data type is defined for such constraint, than the data type is
detected based on the java type of the value. What would be

* String for "value": "Africa"
* Integer for "value": 123
* Float for "value": 1.23

You can also explicitly parse an dataType by using "dataTypes"
    {
        "type": "value",
        "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
        "dataTypes":
["http:\/\/www.iks-project.eu\/ontology\/rick\/model\/text"],
        "value": "South America",
    }

"http://www.iks-project.eu/ontology/rick/model/text" is the data type
used for natural text values.

However the preferred way to query for Natural Text values is to use a
TextConstraint instead of a ValueConstraint.
The TextConstraint equivalent to the above ValueConstraint is:

    {
        "type": "text",
        "text": "South Africa",
        "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
    }

However text constraints also allow to define the languages to search
as well as the use of Wildcards
e.g.

    {
        "selected": [
            "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
            "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"],
        "offset": "0",
        "limit": "30",
        "constraints": [
        {
            "type": "text",
            "languages": ["en-GB"],
            "patternType": "wildcard",
            "text": "Photo*",
            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
        }
    ]
}

A documentation of the FieldQuery syntax is provided at the end of the
Entityhub README.TXT [2]

best
Rupert Westenthaler

[1] http://svn.apache.org/viewvc?rev=1131027&view=rev
[2] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/README.TXT


> 1) When indexing a skos file, only terms with multi-words are indexed, and not term with
one word. I observe this first on my particular thesaurus then also in the iptc one. I try
this request
> $ curl -X POST -F "query=@fieldQuery.json" http://localhost:8080/entityhub/site/iptc/query
> with queries :

> 1.A) @fieldQuery.json =
> {
>    "offset": "0",
>    "limit": "30",
>    "constraints": [
>        {
>          "type": "value",
>          "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
>          "value": "Africa",
>        }
>    ]
> }
>
> ==> output no results
>
> 1.B) @fieldQuery.json =
> {
>    "offset": "0",
>    "limit": "30",
>    "constraints": [
>        {
>          "type": "value",
>          "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel",
>          "value": "South America",
>        }
>    ]
> }

Mime
View raw message