manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: [VOTE] Release Apache ManifoldCF 1.5, RC7
Date Thu, 06 Feb 2014 14:50:01 GMT
On 06.02.14 15:25, Karl Wright wrote:

> So I conclude that simple history is working fine, but since it is only
> returning indexing results within the last hour by default it is confusing
> you.  I also think it is likely that documents are getting skipped because
> you've crawled this set before with the same job and many of the documents
> have not changed.

Karl, we are indexing these documents:

I have tail -F opened up from our Solr test server at the moment:
[2014-02-06 15:21:00.321] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=B]} 0 38
[2014-02-06 15:21:00.359] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=N]} 0 23
[2014-02-06 15:21:29.732] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=G]} 0 29
[2014-02-06 15:22:11.954] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=S]} 0 38
[2014-02-06 15:22:15.752] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=D]} 0 28
[2014-02-06 15:22:18.323] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=H]} 0 34
[2014-02-06 15:22:21.657] INFO [uio] OP crawl 
{add=[http://www.ibsen.uio.no/variakronologi.xhtml]} 0 73

How could these log entries show up on our Solr server if the documents 
were skipped?

And why did I get entries like this earlier today:
DEBUG 2014-02-06 10:28:06,609 (Worker thread '29') - WEB: Decided to 
ingest 'http://www.ibsen.uio.no/varia.xhtml'

(I have changed the log level back to INFO right now, so I cannot see 
these entries for the last crawl, but I will re-enable DEBUG again).

I have re-ingested all documents several times today to be sure that all 
documents were crawled all over again.

Of course, I can try to remove all jobs, delete all tables in PostgreSQL 
and try to create everything from scratch in case the old settings did 
not get upgraded successfully. Unfortunately MCF will delete all tables 
in my index as well.

Erlend

Mime
View raw message