manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From K McGonigal <kmcgon...@gmail.com>
Subject Re: Trouble indexing a Twitter search in RSS format
Date Tue, 16 Aug 2011 21:07:04 GMT
Hmm. I will keep this in mind, but I'm confused again. I just ran this job
twice in a row and pretty much the same thing was sent to Solr.  The same
number of items (7) were "add"ed. I think they were the same items, just in
a different order. The second run also deleted an item from Solr that was
not in the RSS document.  I'm pretty sure the RSS feed document or the
linked documents did not change.

A snippet from the first run:

INFO: {add=[http://www.onemansjazz.ca/content/view/330/50/]} 0 16
> 16-Aug-2011 3:18:11 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update/extract params={literal.source=
> http://www.one
>
> mansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/&literal.category=New
>
> s+-+General&literal.summary=I+have+created+a+Listener+Survey+and+if+you+have+the
>
> +time+to+complete+it,+that+would+be+terrific.++I%26#39;m+trying+to+do+an+evaluat
>
> ion+of+One+Man%26#39;s+Jazz+as+well+as+considering+some+new+options+that+have+ar
>
> isen.++Your+feedback+would+be+most+appreciate.This+survey+is+in+two+parts+and+is
>
> +a+total+of+twenty+parts,+most+of+them+just+require+a+click+of+your+mouse.++Clic
> k+here+(
> http://www.surveymonkey.com/s/C3DZ3JK)++for+Part+One,+and+here+(http://w<http://www.surveymonkey.com/s/C3DZ3JK%29++for+Part+One,+and+here+%28http://w>
>
> ww.surveymonkey.com/s/C38FVH8)++for+Part+Two.+++Thanks+again+for+your+input.+&li<http://ww.surveymonkey.com/s/C38FVH8%29++for+Part+Two.+++Thanks+again+for+your+input.+&li>
> teral.id=
> http://www.onemansjazz.ca/content/view/330/50/&literal.title=Listener+S
> urvey&literal.pubdate=1310475289000} status=0 QTime=16
> 16-Aug-2011 3:18:13 PM org.apache.solr.update.processor.LogUpdateProcessor
> finis
> h
>

A snippet from the second run:

INFO: {add=[http://www.onemansjazz.ca/content/view/330/50/]} 0 15
> 16-Aug-2011 3:27:55 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update/extract params={literal.source=
> http://www.one
>
> mansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/&literal.category=New
>
> s+-+General&literal.summary=I+have+created+a+Listener+Survey+and+if+you+have+the
>
> +time+to+complete+it,+that+would+be+terrific.++I%26#39;m+trying+to+do+an+evaluat
>
> ion+of+One+Man%26#39;s+Jazz+as+well+as+considering+some+new+options+that+have+ar
>
> isen.++Your+feedback+would+be+most+appreciate.This+survey+is+in+two+parts+and+is
>
> +a+total+of+twenty+parts,+most+of+them+just+require+a+click+of+your+mouse.++Clic
> k+here+(
> http://www.surveymonkey.com/s/C3DZ3JK)++for+Part+One,+and+here+(http://w<http://www.surveymonkey.com/s/C3DZ3JK%29++for+Part+One,+and+here+%28http://w>
>
> ww.surveymonkey.com/s/C38FVH8)++for+Part+Two.+++Thanks+again+for+your+input.+&li<http://ww.surveymonkey.com/s/C38FVH8%29++for+Part+Two.+++Thanks+again+for+your+input.+&li>
> teral.id=
> http://www.onemansjazz.ca/content/view/330/50/&literal.title=Listener+S
> urvey&literal.pubdate=1310475289000} status=0 QTime=15
> 16-Aug-2011 3:28:00 PM org.apache.solr.update.processor.LogUpdateProcessor
> finis
> h
>

I think they are identical.


View a Job
>  ------------------------------
>  Name:OMJ
> ------------------------------
>  Output connection: Solr Repository connection: RSS
> ------------------------------
>  Priority:5 Start method:Don't automatically start
> ------------------------------
>  Schedule type:Scan every document once Minimum recrawl interval:Not
> applicable  Expiration interval:Not applicable Reseed interval:Not
> applicable
> ------------------------------
>  No scheduled run times
> ------------------------------
>    Field mappings:  Metadata field name Solr field name No field mapping
> specified
> ------------------------------
>    RSS urls:
> http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
>  ------------------------------
> No url canonicalization specified; will reorder all urls and remove all
> sessions
> ------------------------------
> No mappings specified; will accept all urls
> ------------------------------
>  Feed connection timeout (seconds): 60  Default feed rescan interval
> (minutes): 60  Minimum feed rescan interval (minutes): 15  Bad feed rescan
> interval (minutes): (Default feed rescan value)
> ------------------------------
>  Dechromed content source: none  Chromed content: none
> ------------------------------
> No access tokens specified
> ------------------------------
> No metadata specified



View Repository Connection Status
 ------------------------------
 Name:RSS Description:
 ------------------------------
 Connection type:RSS Max connections:10  Authority:None (global authority)
------------------------------
 Throttling:  Bin regular expression Description Max avg fetches/min No
throttles
------------------------------
   Parameters: Proxy port=
Proxy authentication password=********
Max server connections=2
Proxy host=
KB per second=64
Robots usage=none
Proxy authentication user name=
Max fetches per minute=12
Email address=kmcgoniga@gmail.com
Proxy authentication domain=
Throttle group=
   ------------------------------
 Connection status:Connection working

Mime
View raw message