manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil Żyta <kamil.z...@pwr.edu.pl>
Subject Re: MCF 2 and Solr Cloud 5
Date Wed, 01 Apr 2015 16:43:50 GMT
On Wed, Apr 01, 2015 at 12:07:47PM -0400, Karl Wright wrote:
> Hi Kamil,
> 
> If no attempts are being made to actually index documents, then no
> documents will be indexed.
> 
> (1) What repository connection is this?  Can you try something simple
> first, like indexing from the file system?

I use cifs, in 'Status and Job Management' Documents/Processed is 2598
so I think he can reach files but I can try with 'File systems' connector.

> (2) I have confirmed that changing the collection does NOT trigger
> reindexing of documents.  That is a bug, but you can work around it by
> clicking the "Reindex all documents" button on the output connection's view
> page after every change to the collection name.  Did you click that button?

yes, I clicked that button many times.

K

> 
> 
> On Wed, Apr 1, 2015 at 11:50 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:
> 
> > I see only start/access/stop activities. Access denied is normal in my
> > setup.
> > So how can I debug the problem?
> >
> > K
> >
> > On Wed, Apr 01, 2015 at 08:32:42AM -0700, Karl Wright wrote:
> > > Hi Kamil,
> > > Can you look at the simple history report, to verify whether manifoldcf
> > > is even attempting to post documents? It is possible that the solr
> > > connector doesn't count a change in collection name as requiring a
> > > reindex.
> > >
> > > Karl
> > >
> > > Sent from my Windows Phone
> > > From: Kamil Żyta
> > > Sent: 4/1/2015 11:08 AM
> > > To: user@manifoldcf.apache.org
> > > Subject: Re: MCF 2 and Solr Cloud 5
> > > I created new collection in solr, configure mcf for this collection:
> > > 'Connection working' but I cannot see any /update request from mcf in
> > > solr, only:
> > >
> > > INFO  - 2015-04-01 15:03:16.442;
> > > org.apache.solr.update.DirectUpdateHandler2; start
> > >
> > commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> > > INFO  - 2015-04-01 15:03:16.444;
> > > org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
> > > Skipping IW.commit.
> > > INFO  - 2015-04-01 15:03:16.445; org.apache.solr.core.SolrCore;
> > > SolrIndexSearcher has not changed - not re-opening:
> > > org.apache.solr.search.SolrIndexSearcher
> > > INFO  - 2015-04-01 15:03:16.445;
> > > org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> > > INFO  - 2015-04-01 15:03:16.445;
> > > org.apache.solr.update.processor.LogUpdateProcessor;
> > > [dysk_shard1_replica1] webapp=/solr path=/update
> > >
> > params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&waitSearcher=true&openS
> > > earcher=true&commit=true&softCommit=false&distrib.from=
> > http://10.26.26.29:8983/solr/dysk_shard2_replica1/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false
> > }
> > > {commit=} 0 3
> > > INFO  - 2015-04-01 15:03:16.448;
> > > org.apache.solr.update.DirectUpdateHandler2; start
> > >
> > commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> > > INFO  - 2015-04-01 15:03:16.449;
> > > org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
> > > Skipping IW.commit.
> > > INFO  - 2015-04-01 15:03:16.449; org.apache.solr.core.SolrCore;
> > > SolrIndexSearcher has not changed - not re-opening:
> > > org.apache.solr.search.SolrIndexSearcher
> > > INFO  - 2015-04-01 15:03:16.450;
> > > org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> > > INFO  - 2015-04-01 15:03:16.450;
> > > org.apache.solr.update.processor.LogUpdateProcessor;
> > > [dysk_shard2_replica1] webapp=/solr path=/update
> > >
> > params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&waitSearcher=true&openS
> > > earcher=true&commit=true&softCommit=false&distrib.from=
> > http://10.26.26.29:8983/solr/dysk_shard2_replica1/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false
> > }
> > > {commit=} 0 2
> > > INFO  - 2015-04-01 15:03:16.456;
> > > org.apache.solr.update.processor.LogUpdateProcessor;
> > > [dysk_shard2_replica1] webapp=/solr path=/update/extract
> > > params={commit=true&wt=javabin&version=2} {commit=} 0 21
> > >
> > > K
> > >
> > > On Wed, Apr 01, 2015 at 10:53:39AM -0400, Karl Wright wrote:
> > > > "When I put 'esci' as collection name I get a error.
> > > > When I put 'collection1' I get 'Connection working' and no errors in
> > logs
> > > > but
> > > > still no docs in solr."
> > > >
> > > > Hi Kamil,
> > > > Do you get the exception when you use "collection1" as the collection
> > > > name?  If not, then here's what I recommend:
> > > >
> > > > (1) Look at the Solr logs.  There should be an INFO message for each
> > > > document posted.  There is a URL in the message, and a document
> > length, and
> > > > a result.  It would be great if you could include a couple of these
> > for us
> > > > to look at.
> > > >
> > > > (2) If there are any exceptions etc. in the Solr logs, please send
> > those
> > > > along as well.
> > > >
> > > > Offhand, this sounds like documents get posted properly but then
> > ignored by
> > > > Solr.  There are a lot of potential reasons why that could be the case.
> > > > But if the documents are getting ignored, or if Tika is not
> > successfully
> > > > extracting data, then we should be able to figure out why based on the
> > Solr
> > > > logs.
> > > >
> > > > Thanks,
> > > > Karl
> > > >
> > > >
> > > >
> > > > On Wed, Apr 1, 2015 at 10:39 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> > wrote:
> > > >
> > > > > Ok, see my first mail. When I put 'esci' as collection name I get
a
> > error.
> > > > > When I put 'collection1' I get 'Connection working' and no errors
in
> > logs
> > > > > but
> > > > > still no docs in solr.
> > > > >
> > > > > K
> > > > >
> > > > > On Wed, Apr 01, 2015 at 10:27:50AM -0400, Karl Wright wrote:
> > > > > > Hi Kamil,
> > > > > >
> > > > > > This is happening on the commit.  It looks to me like it's because
> > you
> > > > > are
> > > > > > specifying a collection that doesn't actually exist:
> > > > > >
> > > > > > >>>>>>
> > > > > >     DocCollection col = getDocCollection(clusterState, collection);
> > > > > >
> > > > > >     DocRouter router = col.getRouter();
> > > > > > <<<<<<
> > > > > >
> > > > > > It's complaining because "col" is coming back null.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 1, 2015 at 10:19 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl
> > >
> > > > > wrote:
> > > > > >
> > > > > > > ERROR 2015-04-01 16:09:24,032 (Job notification thread)
-
> > Unhandled
> > > > > > > SolrServerException: java.lang.NullPointerException
> > > > > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> > Unhandled
> > > > > > > SolrServerException: java.lang.NullPointerException
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:364)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.manifoldcf.agents.output.solr.HttpPoster.commitPost(HttpPoster.java:308)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.manifoldcf.agents.output.solr.SolrConnector.noteJobComplete(SolrConnector.java:610)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:121)
> > > > > > > Caused by: org.apache.solr.client.solrj.SolrServerException:
> > > > > > > java.lang.NullPointerException
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:873)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:738)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1372)
> > > > > > > Caused by: java.lang.NullPointerException
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:520)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:892)
> > > > > > >         at
> > > > > > >
> > > > >
> > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:795)
> > > > > > >         ... 3 more
> > > > > > >
> > > > > > > K
> > > > > > >
> > > > > > > On Wed, Apr 01, 2015 at 10:15:13AM -0400, Karl Wright wrote:
> > > > > > > > Hi Kamil,
> > > > > > > >
> > > > > > > > So you are still seeing a NullPointerException from
> > > > > > > > org.apache.solr.client.solrj.impl.CloudSolrClient?
 Can you
> > provide
> > > > > the
> > > > > > > > entire stack trace?
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Apr 1, 2015 at 10:10 AM, Kamil Żyta <
> > kamil.zyta@pwr.edu.pl>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Karl,
> > > > > > > > > same thing with trunk. Any advice?
> > > > > > > > >
> > > > > > > > > K
> > > > > > > > >
> > > > > > > > > On Wed, Apr 01, 2015 at 09:37:47AM -0400, Karl
Wright wrote:
> > > > > > > > > > Hi Kamil,
> > > > > > > > > >
> > > > > > > > > > Solrj 5.0 changed massively from Solrj 4.x.
 The work to
> > use
> > > > > Solrj
> > > > > > > 5.0
> > > > > > > > > has
> > > > > > > > > > been done on trunk.  You will need to check
out and build
> > trunk
> > > > > in
> > > > > > > order
> > > > > > > > > to
> > > > > > > > > > use Solr 5.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 1, 2015 at 9:23 AM, Kamil Żyta
<
> > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > I set up solr 5 (Cloud) and mcf2, created
core in solr
> > with 2
> > > > > > > shards
> > > > > > > > > and 2
> > > > > > > > > > > replicas:
> > > > > > > > > > > https://i.imgur.com/M05QTu7.png and
created Output
> > > > > Connections in
> > > > > > > mcf.
> > > > > > > > > > > When I put 'esci' in 'Collection name'
I got error:
> > > > > > > > > > > Threw exception: 'Unhandled SolrServerException:
No live
> > > > > > > SolrServers
> > > > > > > > > > > available to handle this request:[
> > > > > > > http://10.26.26.29:8983/solr/esci,
> > > > > > > > > > > http://10.26.26.28:8983/solr/esci]'
> > > > > > > > > > > When I leave 'Collection name' empty
I have 'Connection
> > > > > working'.
> > > > > > > > > > > Now when I start job, everything look
good, worker fetch
> > docs,
> > > > > etc
> > > > > > > > > > > but I cannot see any docs in solr.
Nothing in logs
> > except one
> > > > > line
> > > > > > > in
> > > > > > > > > > > worker
> > > > > > > > > > > console:
> > > > > > > > > > > [Thread-6476596] ERROR
> > > > > > > > > org.apache.solr.client.solrj.impl.CloudSolrClient
-
> > > > > > > > > > > Request to collection  failed due to
(0)
> > > > > > > > > java.lang.NullPointerException,
> > > > > > > > > > > retry? 0
> > > > > > > > > > > thanks for the advice.
> > > > > > > > > > >
> > > > > > > > > > > K
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> >

Mime
View raw message