incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Adamou <ada...@cs.unibo.it>
Subject Re: Setting a read limit when parsing a Graph
Date Thu, 16 Aug 2012 09:58:00 GMT
Thanks Reto,

based on what you said I decided to do an implementation of the 
lookahead method with a limit set on triples instead of bytes. It should 
still have a pretty decent memory footprint and takes a reasonable time. 
It is now a Stanbol utility of commons.owl

Alessandro


On 8/14/12 2:11 PM, Reto Bachmann-Gmür wrote:
> Hi Alessandro,
>
> Two things:
>
> - the mark method doesn't truncate the stream after the indicated number of
> bytes, but makes sure that within the indicated number of bytes one can
> reset the stream back to that position. If one reads more than the
> indicated number of bytes the mark becomes invalid (i.e. reset won't work)
> but otherwise the stream behaves as normal.
> - I'mm not sure how the jena parser works and if you get the triples read
> so far if your rdf/xml is truncated. You might want to truncate n-triples
> after a dot.
>
> Cheers,
> Reto
>
>
>
> On Tue, Aug 14, 2012 at 1:53 PM, Alessandro Adamou <adamou@cs.unibo.it>wrote:
>
>> Hi,
>>
>> I need to write a function that performs lookahead of the OWL ontology ID
>> for a Graph, therefore it has to scan the content up to a certain point to
>> see if it has found an ontology IRI / version IRI pair.
>>
>> I thought that setting mark() on a BufferedInputStream did the trick,
>> something like:
>>
>> MGraph graph = new SimpleMGraph();
>> BufferedInputStream bIn = new BufferedInputStream(content);
>> bIn.mark(1240); // Read up to 1k
>> parser.parse(graph, bIn, SupportedFormat.RDF_XML);
>>
>> (parser has a Jena parser provider registered)
>>
>> But apparently this is not working. Even for streams much longer than 1
>> kiB, with the interesting triples right at the very end, these triples are
>> always found.
>>
>> Do the Clerezza parser override the marks on a buffered stream, or maybe
>> Jena is doing so? Or even better, am I doing this wrong?
>>
>> Best,
>> -- Alessandro
>>
>> --
>> M.Sc. Alessandro Adamou
>>
>> Alma Mater Studiorum - Università di Bologna
>> Department of Computer Science
>> Mura Anteo Zamboni 7, 40127 Bologna - Italy
>>
>> Semantic Technology Laboratory (STLab)
>> Institute for Cognitive Science and Technology (ISTC)
>> National Research Council (CNR)
>> Via Nomentana 56, 00161 Rome - Italy
>>
>>
>> "I will give you everything, just don't demand anything."
>> (Ettore Petrolini, 1917)
>>
>> Not sent from my iSnobTechDevice
>>
>>


-- 
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"I will give you everything, just don't demand anything."
(Ettore Petrolini, 1917)

Not sent from my iSnobTechDevice


Mime
View raw message