manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Dunham <joshua.dun...@gmail.com>
Subject Re: Output Connector - Apache Marmotta
Date Wed, 09 Sep 2015 16:55:13 GMT
Hi Karl, Rafa,

  I finally had some time to work on this and I have a scheme which (largely) works very well
but I have some question, one stumbling block, and one comment.

First, my environment consists of, Manifold v 2.1, MariaDB which I imported a small CSV into
for testing, and Marmotta 3.3.

The real interesting bits are in specifying the Task. I have the mySQL input -> metadata
adjuster -> filesystem output. mySQL is setup and the connection shows as OK and on starting
the job, it does write files to the output folder.

Getting the list of ID's works well no issue there, and I'm not using versioning or access
tokens yet. The stumbling block has to do with setting up the Data Query and the best use
of the $URL and $DATA variables. First: I've hijacked the $URL into ~ CONCAT("addresses/",
id) AS $(URLCOLUMN) which has the effect of creating a folder called addresses in the root
of the output folder. Inside of the addresses folder it makes numbered files corresponding
to the rowID. I can point the root folder path at the marmotta import directory and even use
the context templating feature (setting 'addresses' into the real context name). That's really
slick for out of the box hack at integration.

What is not apparent is how to use the metadata adjuster to interact with the variables in
the Data query. I've followed the guide and made a simple hello, False, ${city} statement
but the only bits that are written into the file are the contents of the $DATACOLUMN variable.
So, given a simple address book in a database with columns, id, street, city, region, country,
post code, latitude, longitude ... how should I approach making such a data query? My real
use cases will be much much more complicated so I'm wondering if you have some explanation
of how I should want to use that field and maybe a small SQL snippet example with those columns?
:) My end goal is to have a column called out and then use the metadata adjuster to simply
prepend each column's value with a string. So if the city is 'New York' it would write out
city:New_York or the like. 

=====

The comment was in regards to a bit of sample data which could ship with the source. It would
be very educational if there was a complex but real configuration of ManifoldCF that links
to a sqlite3 file as input and maybe the same one input db but a different table as output?

=====

My question is; why would I need to setup different transform modules? Since there is no real
config to do in the transform connector (all the good stuff seems to be under Task config)
I'm not sure why I would need to make more than one and keep reusing it by changing the transform
paeans under task?


Thank you!

J


> On 5 July 2015 at 17:27, Karl Wright <daddywri@gmail.com> wrote:
> Hi Joshua,
> 
> My take:
> 
> --> (A) How I define the data to grab, whether some SQL statement or the
> like. <--
> 
> Have a look at the user documentation here:
> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
> 
> It should be pretty clear how you define what you are looking for.
> 
> --> (B) How to use this data as individual variables which I can arrange
> into a linked data relationship (ManifoldCF mapping module?) <--
> 
> Rafa's previous reply about the RepositoryDocument is appropriate. 
> Basically, an output connector will be handed one of those objects for every
> MCF "document".  The javadoc for it is here:
> 
> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
> 
> --> (C) How difficult would it be to connect to Marmotta's webservice(s).
> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> support for elasticsearch so maybe I could put something together that
> talks to Marmotta..<--
> 
> You can readily write your own output connector.  There's a book, in fact,
> describing how to do that.  See:
> 
> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
> 
> ... and read Chapter 9.
> 
> Thanks,
> Karl
> 
> 
> On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <joshua.dunham@gmail.com>
> wrote:
>> 
>> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> do you know of any resources which I can use to get up to speed with
>> using it in this way?
>> 
>> -J
>> 
>>> On 4 July 2015 at 21:48,  <rharoapache@gmail.com> wrote:
>>> Hi Joshua,
>>> 
>>> The ManifoldCF unit logic in terms of indexing is the Repository
>>> Document
>>> which, simplifying a lot, model a document composed by content plus
>>> metadata
>>> (key-value). It should be relative easy to tripifly that structure and
>>> push
>>> it to Marmotta using SPARQL update queries or Marmotta’s java client for
>>> adding resources.
>>> The Generic Database connector uses a set of queries for crawling the
>>> database. You should have to use that queries to get you data. I’m not
>>> completely sure if each record result is converted directly to a
>>> Repository
>>> Document, that is something that I would need to check.
>>> 
>>> Hope that helps,
>>> Cheers, Rafa
>>> 
>>> 
>>> 
>>> 
>>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <joshua.dunham@gmail.com>
>>> wrote:
>>>> 
>>>> Hi ManifoldCF Users (and Devs)
>>>> 
>>>> I'm wondering if ManifoldCF can work in my use case. I have some
>>>> random mySQL and Oracle DB's that I would like to connect to and
>>>> extract certain known bits of info, format them each a certain way and
>>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>>>> store for linked data so I would need to parse and store the mySQL and
>>>> Oracle DB's info into a linked format, which is no problem for me to
>>>> create the relationships etc, I just need something that would let me
>>>> specifically do this.
>>>> 
>>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>>>> (via non-distributed libraries), and store the results out in several
>>>> target data stores. What isn't clear is
>>>> (A) How I define the data to grab, whether some SQL statement or the
>>>> like.
>>>> (B) How to use this data as individual variables which I can arrange
>>>> into a linked data relationship (ManifoldCF mapping module?)
>>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>>>> support for elasticsearch so maybe I could put something together that
>>>> talks to Marmotta..
>>>> 
>>>> Would this be possible? If so, could someone point me in the right
>>>> direction?
>>>> 
>>>> Thanks!
>>>> -Joshua
>>>> 
>>>> 
>>>> [1] - http://marmotta.apache.org/index.html


Mime
View raw message