manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Trouble indexing a Twitter search in RSS format
Date Mon, 15 Aug 2011 16:49:08 GMT
The behavior depends on the setting of the other pair of radio buttons
on that tab.  You can select "Use chromed content if not found" or
"Never use chromed content".  So, if the feed has no "description"
field for the document, and the dechromed content setting is
"description field", and the other setting is "Never use chromed
content", no document will be indexed.

Karl


On Mon, Aug 15, 2011 at 12:44 PM, K McGonigal <kmcgoniga@gmail.com> wrote:
> I deleted my twitter RSS job and created another one and now it works!
>
> Doing some experimentation, I see that when Dechromed Content is set to "No
> dechromed content" it ingests fine, but when set to "if present, in
> 'description' field" it doesn't do the ingestion (nothing is added to
> Solr).  Is that to be expected?
>
>
> Kate
>
>
> On Mon, Aug 15, 2011 at 10:48 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> Regardless of the twitter sign-in issue, I'd still expect the RSS
>> connector to index whatever it finds at the redirected page, even if
>> it's not very useful stuff.  Could you send me a screen shot of the
>> view page for the RSS connection and for the RSS job?  Also, if you
>> could delete the job that contains the twitter RSS feed and recreated
>> it, then crawl, I'd like to see the simple history for that crawl.
>>
>> Thanks,
>> Karl
>>
>> On Mon, Aug 15, 2011 at 11:38 AM, K McGonigal <kmcgoniga@gmail.com> wrote:
>> > Hmm, that's odd the URLs didn't work for you.  I've asked other people
>> > here
>> > to try them and they had no problems.
>> >
>> > After your suggestion I tried the web connector (but still with no
>> > access
>> > credentials) and it did pretty well ingesting the RSS feed, so I might
>> > be
>> > able to just use that.
>> >
>> > I'm still mystified as to why the RSS connector couldn't handle it
>> > though. I
>> > turned on DEBUG logging in Manifold, but that did not show anything
>> > unusual.
>> >
>> > Thanks,
>> > Kate
>> >
>> > On Fri, Aug 12, 2011 at 3:58 PM, Karl Wright <daddywri@gmail.com> wrote:
>> >>
>> >> When I drop any of these URLs into my browser, I get redirected to a
>> >> login screen.  Therefore it looks to me like Twitter does some kind of
>> >> session-based login, tracked with cookies.  That would require
>> >> maintenance of session cookies which the RSS connector simply does not
>> >> do, and the coding of a login sequence as well.
>> >>
>> >> This is not a straightforward feature to add to the RSS connector, by
>> >> any
>> >> means.
>> >>
>> >> The web connector does have support for login sequencing and cookie
>> >> session maintenance, and it does know how to chase RSS feeds, so that
>> >> might be an option for you to try.  The problem is that most login
>> >> sequences are non-trivial to set up and you will need a lot of
>> >> patience and web spelunking skills to get it right.  The documentation
>> >> is of some help but really could use a good example.
>> >>
>> >>
>> >> Hope this helps.
>> >> Karl
>> >>
>> >> On Fri, Aug 12, 2011 at 4:42 PM, K McGonigal <kmcgoniga@gmail.com>
>> >> wrote:
>> >> > Sorry to bother everyone again but I'm having trouble with an RSS
>> >> > connector
>> >> > job on a Twitter search. When I try to run a job on
>> >> > http://search.twitter.com/search.rss?q=Campylobacter the fetch
>> >> > appears
>> >> > to
>> >> > work OK, but the document ingestion does not occur.
>> >> >
>> >> > I was wondering if it is just my setup, or could it be the
>> >> > redirection
>> >> > that
>> >> > Twitter does on the links. For instance, a link shown in the RSS feed
>> >> > as
>> >> > http://twitter.com/VashinkaInuiel/statuses/101493222852923393
>> >> > redirects
>> >> > to
>> >> > http://twitter.com/#!/VashinkaInuiel/statuses/101493222852923393 when
>> >> > it
>> >> > is
>> >> > followed.
>> >> >
>> >> > Any help is very appreciated.
>> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Mime
View raw message