jena-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Extending TDB/Fuseki with a fuzzy match primitive/custom index
Date Tue, 08 Feb 2011 13:08:41 GMT
So, this is TDB-independent. Is the idea here that I'd use, say,
fuseki and concoct some sort of an assembly to glue it together?

On Sun, Jan 30, 2011 at 12:44 PM, Andy Seaborne
<andy.seaborne@epimorphics.com> wrote:
>
>
> On 28/01/11 15:50, Benson Margulies wrote:
>>
>> At the day job, one of our lead technologies is a device that can
>> decide that 'Barak Obama' and 'Barack Obama' are probably the same
>> thing, or even that 歐巴馬 is another spelling. Is there an extension
>> model for SPARQL queries? In this case, it wouldn't really work to
>> just live in the FILTER, since the fundamental selection would be
>> something like:
>>
>>
>> ?s something:hasName "Barak Obama"
>>
>> and we want to tamper with how the literal string gets compared. We
>> have one API that says "how similar are these strings" and another
>> more complex model in which we build an index that rapidly returns all
>> the strings that are within some distance of a query. We could, of
>> course, build our own index by mining  TDB, make our own query, and
>> then get busy SPARQL-ing starting from a set of URI's thus derived,
>> but I just wondered about a more integrated approach.
>
> Benson,
>
> ARQ provides "property functions" where a property is matched by calling
> custom code, not the storage-level matching
>
> http://openjena.org/ARQ/extension.html#propertyFunctions
>
> One example is free-text matching, using Lucene:
>
> http://openjena.org/ARQ/lucene-arq.html
>
> A property function can provide the access to another index such as your
> example of similar literals. You could either index literal to literal by
> similarity or literal to resource it relates to.  The similarity can return
> multiple possible matches (one of the reasons for extending via properties
> is that it gives a framework multiple matches unlike FILTERs).
>
> (Property functions do not work in all property paths situations currently -
> not clear what it means in {0} and *, nor the interaction with the
> backtracking search)
>
>        Andy
>

Mime
View raw message