manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Trouble indexing a Twitter search in RSS format
Date Mon, 15 Aug 2011 15:48:33 GMT
Regardless of the twitter sign-in issue, I'd still expect the RSS
connector to index whatever it finds at the redirected page, even if
it's not very useful stuff.  Could you send me a screen shot of the
view page for the RSS connection and for the RSS job?  Also, if you
could delete the job that contains the twitter RSS feed and recreated
it, then crawl, I'd like to see the simple history for that crawl.

Thanks,
Karl

On Mon, Aug 15, 2011 at 11:38 AM, K McGonigal <kmcgoniga@gmail.com> wrote:
> Hmm, that's odd the URLs didn't work for you.  I've asked other people here
> to try them and they had no problems.
>
> After your suggestion I tried the web connector (but still with no access
> credentials) and it did pretty well ingesting the RSS feed, so I might be
> able to just use that.
>
> I'm still mystified as to why the RSS connector couldn't handle it though. I
> turned on DEBUG logging in Manifold, but that did not show anything unusual.
>
> Thanks,
> Kate
>
> On Fri, Aug 12, 2011 at 3:58 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> When I drop any of these URLs into my browser, I get redirected to a
>> login screen.  Therefore it looks to me like Twitter does some kind of
>> session-based login, tracked with cookies.  That would require
>> maintenance of session cookies which the RSS connector simply does not
>> do, and the coding of a login sequence as well.
>>
>> This is not a straightforward feature to add to the RSS connector, by any
>> means.
>>
>> The web connector does have support for login sequencing and cookie
>> session maintenance, and it does know how to chase RSS feeds, so that
>> might be an option for you to try.  The problem is that most login
>> sequences are non-trivial to set up and you will need a lot of
>> patience and web spelunking skills to get it right.  The documentation
>> is of some help but really could use a good example.
>>
>>
>> Hope this helps.
>> Karl
>>
>> On Fri, Aug 12, 2011 at 4:42 PM, K McGonigal <kmcgoniga@gmail.com> wrote:
>> > Sorry to bother everyone again but I'm having trouble with an RSS
>> > connector
>> > job on a Twitter search. When I try to run a job on
>> > http://search.twitter.com/search.rss?q=Campylobacter the fetch appears
>> > to
>> > work OK, but the document ingestion does not occur.
>> >
>> > I was wondering if it is just my setup, or could it be the redirection
>> > that
>> > Twitter does on the links. For instance, a link shown in the RSS feed as
>> > http://twitter.com/VashinkaInuiel/statuses/101493222852923393 redirects
>> > to
>> > http://twitter.com/#!/VashinkaInuiel/statuses/101493222852923393 when it
>> > is
>> > followed.
>> >
>> > Any help is very appreciated.
>> >
>> >
>> >
>
>

Mime
View raw message