incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: about the supported input format of any23
Date Thu, 21 Jun 2012 21:59:20 GMT
No your doing nothing incorrectly. I get pretty dismal results both
with basic-crawler within Any23 please see below

lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local$ any23
rover http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning
[1] 2956
[2] 2957
[3] 2958
lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local$
------------------------------------------------------------------------
Apache Any23 :: rover
------------------------------------------------------------------------

@prefix dcterms: <http://purl.org/dc/terms/> .

<http://en.wikipedia.org/w/api.php?action=query> dcterms:title
"MediaWiki API Result" .

------------------------------------------------------------------------
Apache Any23 SUCCESS
Total time: 2s
Finished at: Thu Jun 21 22:53:27 BST 2012
Final Memory: 24M/483M
------------------------------------------------------------
[1]   Done                    any23 rover
http://en.wikipedia.org/w/api.php?action=query
[2]-  Done                    list=search
[3]+  Done                    srwhat=text

The problem is that I don't know how crawler4j deals with some
characters such as '?' within URL strings. and whether it treats them
as queries or not? By the looks of the log output above, the URL
string is being treated incorrectly.

Sitting above all of this is the fact that I don't think the wiki
markup syntax is not supported within Any23 parser implementations.

Lewis


On Thu, Jun 21, 2012 at 10:29 PM, armon <zhimeng9@gmail.com> wrote:
> and even when I copy the xml part of data in the url as the input content,
> it still can't work well,  but when I try a rdf file, it works well, is
> there anything I do incorrectly?
>
>
> 2012/6/22 armon <zhimeng9@gmail.com>
>
>> Hi Lewis, thanks very much for your reply, I am sorry to interrupt you so
>> late,
>>
>> the url I used was:
>>
>>
>> http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning
>>
>>
>> and then I used command: ./any23 rover url(showed above) to run the
>> result.
>>
>> thanks.
>>
>> armon
>>
>>
>>
>>
>>
>>
>> 2012/6/22 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
>>
>>> Hi Armon,
>>>
>>> On Thu, Jun 21, 2012 at 4:15 PM, armon <zhimeng9@gmail.com> wrote:
>>> > Hi,
>>> >       I do some data transform currently from xml-format wiki data
>>>
>>> Can you give a small example of this xml?
>>>
>>> > (retrieved by wikipedia API) to turtle,
>>>
>>> Also a small example of your turtle
>>>
>>> > but it seems that the any23 can't
>>> > work correctly. (I used the command: ./any23 rover url )
>>>
>>> What do you get to std out? I am easily able to use any23 parsers on
>>> fetching structure from wikipedia pages... but this is not what you
>>> are referring to... I need some more information from you please.
>>>
>>> >
>>> >       Does any23 actually support the xml data retrieved by wikipedia
>>> API
>>> > as the input format ?
>>>
>>> Please see above
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Lewis
>>>
>>
>>



-- 
Lewis

Mime
View raw message