stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Riccitelli <da...@insideout.io>
Subject Re: can we pass json as input
Date Thu, 11 Apr 2013 06:03:26 GMT
Hello Harish,

We're using the Stanbol Façade contributed bundle [1] to pass a URL with a
JSON payload. With this bundle installed, you can post analysis task:
 curl -ik \
  -X POST \
  -H "Content-Type: application/json" \
  http://localhost:8080/api/tasks \
  -d @*a json file*

The JSON may specify the URL to analyze:
 {
    "url": "
http://www.corriere.it/politica/12_dicembre_11/Berlusconi-che-ci-importa-dello-spread_0f328ec8-4368-11e2-b89b-3cf6075586fe.shtml
",
    "mimeType": "application/rdf+xml"
 }

The URL is parsed using Readability [2] which is the same algorithm used by
Safari to display the "reading page", which basically loads only the
content text and removes all the noise (such as the header, menus,
sidebars, footers, and so forth). It also loads an content split on
multiple pages automatically.

There is an interesting thread about this bundle and its future
developments, on the Façade APIs and comparison of parsing engine such as
Readability [3].

BR,
David

[1] https://github.com/insideout10/stanbol-facade

[2]
https://github.com/insideout10/stanbol-facade/blob/master/stanbol-facade-api/src/main/java/io/insideout/stanbol/facade/services/UrlGrabberService.java

[3]
http://mail-archives.apache.org/mod_mbox/stanbol-dev/201301.mbox/%3CCAG94HGgH18n+ghW3iHQSo1Pm9hKg56io3OnR0PD_F9xAWdEqCQ@mail.gmail.com%3E


On Wed, Apr 10, 2013 at 11:11 PM, harish suvarna <hsuvarna@gmail.com> wrote:

> ....Basically I would like to post a bunch of urls through json, get html
> of the urls, parse the html, enhance and output the results.
> Well, outputting the enahncements for eah url is also a problem. May be I
> have to do it outside stanbol and then for each parsed text of url call
> stanbol.
>
>
> On Wed, Apr 10, 2013 at 1:04 PM, harish suvarna <hsuvarna@gmail.com>
> wrote:
>
> > Does Stanbol restful api allow posting a json as input? Is there any
> > definition for such json?
> > I would like to have an (say jsoup) engine which fetches the html of a
> url
> > and parses the html for further processing?
> >
> > --
> > Thanks
> > Harish
> >
>
>
>
> --
> Thanks
> Harish
>



-- 
David Riccitelli

-- check the Swagger for WordLift <http://bit.ly/VtoM5H>
********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message