incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szymon Danielczyk <danielczyk.szy...@gmail.com>
Subject Re: http://webdatacommons.org/
Date Fri, 23 Mar 2012 11:16:12 GMT
Hi
Paragraph from their website

"Our solution is to run (Java) regular expressions against each
webpages prior to extraction, which detect the presence of a
microformat in a HTML page, and then only run the Any23 extractor when
the regular expression find potentional matches."

Are we using any technics like that to decide that there is anything
to parse in the document ?
Maybe we can build in such feature like a method/filter for users that
want to parse huge number of docs
to detect that the document is worth parsing

They have the table with regex they used for each format
Any opinions about this

Szymon

On 23 March 2012 10:38, Davide Palmisano <dpalmisano@gmail.com> wrote:
> Thanks Michele,
>
> this is a great news.
>
> Should we have a section on the web site listing
> all the products/initiatives that are using Any23?
>
> On Fri, Mar 23, 2012 at 11:01 AM, Michele Mostarda
> <michele.mostarda@gmail.com> wrote:
>> Hi Guys,
>>
>>   just a curiosity:
>>
>>    Any23 has been recently used to parse the entire corpus  of Semantic
>> Web Data existing on the Web [0].
>>
>> The best.
>>
>> Mic
>>
>> [0] http://webdatacommons.org/
>>
>> --
>> Michele Mostarda
>> Senior Software Engineer
>> skype: michele.mostarda
>> twitter: micmos
>> mail: me@michelemostarda.com
>> site : http://www.michelemostarda.com
>
>
>
> --
> Davide Palmisano
>
> http://davidepalmisano.com
> http://twitter.com/dpalmisano

Mime
View raw message