any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Potter <...@yahoo-inc.com>
Subject Re: Too many tuples!!
Date Thu, 05 Apr 2012 10:59:35 GMT
Hi Lewis,
  Having read the page http://incubator.apache.org/any23/dev-microformat-extractors.html I'm
beginning to understand what Any23 is doing.  The nesting_original and nesting_strucured tuples
are been added to show the relation between an hcard and a nested geo element for example.
 Looking at the annotations on the HCardExtractor class currently the only nested elements
that won't generate these additional tuples are adr elements.  Looking at http://microformats.org/wiki/hcard
I'm wondering if the geo extractor should also be in the @Includes annotation of HCardExtractor??

Regards,
  Tim P.

From: "Yahoo! Inc." <tep@yahoo-inc.com<mailto:tep@yahoo-inc.com>>
Reply-To: "any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>"
<any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>>
Date: Thu, 5 Apr 2012 10:44:43 +0100
To: "any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>" <any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>>
Subject: Re: Too many tuples!!

Hi Lewis,
   Maybe the pages has been modified slightly since I copied that snippet.  If you search
for '1180903W' in the page source you should find the entry.    I guest the easiest
way to reproduce the problem is to run:

 'any23tools Rover http://en.wikipedia.org/wiki/List_of_Nike_missile_locations'

It returns somewhere in the order of a million tuples.

I found switching off the nested triple production ('any23tools Rover Cn http.') returns
a lot less. Like a few thousand.

Like I said, I don't have enough experience with RDF to know if what Any23 is extracting is
correct.  Just seems like a lot of tuples..

Thanks for your help.

Regards,
  Tim P.



From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com<mailto:lewis.mcgibbney@gmail.com>>
Reply-To: "any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>"
<any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>>
Date: Wed, 4 Apr 2012 23:19:11 +0100
To: "any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>" <any23-user@incubator.apache.org<mailto:any23-user@incubator.apache.org>>
Subject: Re: Too many tuples!!

Hi Tim,

I've just picked this up, it got lost in my filters.

2012/3/30 Tim Potter <tep@yahoo-inc.com<mailto:tep@yahoo-inc.com>>

http://en.wikipedia.org/wiki/List_of_Nike_missile_locations

With regards to the link to the above URL and the source below, I can't find snippet below
in the above page!!! Can you please check and confirm for me.

Given the HTML Snippet:


<a href="http://toolserver.org/~geohack/geohack.php?pagename=List_of_Nike_missile_locations&amp;params=34_22_41_N_118_09_03_W_&amp;title=LA-04-LS<http://toolserver.org/%7Egeohack/geohack.php?pagename=List_of_Nike_missile_locations&params=34_22_41_N_118_09_03_W_&title=LA-04-LS>"
class="external text" rel="nofollow" style="white-space: normal;">

<span class="geo-default">

<span title="Maps, aerial photos, and other data for this location" class="geo-dms">

<span class="latitude">342241N</span>

<span class="longitude">1180903W</span>

</span>

</span>

<span class="geo-multi-punct">&#65279; / &#65279;</span>

<span class="geo-nondefault">

<span class="vcard">

<span title="Maps, aerial photos, and other data for this location" class="geo-dec">34.37806N
118.15083W</span>

<span style="display: none">

&#65279; /

<span class="geo">34.37806; -118.15083</span>

</span>

<span style="display: none">

&#65279; (

<span class="fn org">LA-04-LS</span>

)

</span>

</span>

</span>

</a>


Mime
View raw message