manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: MCF 2 and Solr Cloud 5
Date Thu, 02 Apr 2015 13:46:17 GMT
Any luck figuring this out?
Karl

On Wed, Apr 1, 2015 at 1:01 PM, Karl Wright <daddywri@gmail.com> wrote:

> The button works fine.  So the problem must be on the repository side.
>
> Karl
>
>
> On Wed, Apr 1, 2015 at 12:56 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> If your simple history shows no documents being processed or indexed,
>> then that's the problem, or at least one of them.
>>
>> I will try to confirm that the reindex button still works as it should.
>>
>> Karl
>>
>>
>> On Wed, Apr 1, 2015 at 12:43 PM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
>> wrote:
>>
>>> On Wed, Apr 01, 2015 at 12:07:47PM -0400, Karl Wright wrote:
>>> > Hi Kamil,
>>> >
>>> > If no attempts are being made to actually index documents, then no
>>> > documents will be indexed.
>>> >
>>> > (1) What repository connection is this?  Can you try something simple
>>> > first, like indexing from the file system?
>>>
>>> I use cifs, in 'Status and Job Management' Documents/Processed is 2598
>>> so I think he can reach files but I can try with 'File systems'
>>> connector.
>>>
>>> > (2) I have confirmed that changing the collection does NOT trigger
>>> > reindexing of documents.  That is a bug, but you can work around it by
>>> > clicking the "Reindex all documents" button on the output connection's
>>> view
>>> > page after every change to the collection name.  Did you click that
>>> button?
>>>
>>> yes, I clicked that button many times.
>>>
>>> K
>>>
>>> >
>>> >
>>> > On Wed, Apr 1, 2015 at 11:50 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
>>> wrote:
>>> >
>>> > > I see only start/access/stop activities. Access denied is normal in
>>> my
>>> > > setup.
>>> > > So how can I debug the problem?
>>> > >
>>> > > K
>>> > >
>>> > > On Wed, Apr 01, 2015 at 08:32:42AM -0700, Karl Wright wrote:
>>> > > > Hi Kamil,
>>> > > > Can you look at the simple history report, to verify whether
>>> manifoldcf
>>> > > > is even attempting to post documents? It is possible that the
solr
>>> > > > connector doesn't count a change in collection name as requiring
a
>>> > > > reindex.
>>> > > >
>>> > > > Karl
>>> > > >
>>> > > > Sent from my Windows Phone
>>> > > > From: Kamil Żyta
>>> > > > Sent: 4/1/2015 11:08 AM
>>> > > > To: user@manifoldcf.apache.org
>>> > > > Subject: Re: MCF 2 and Solr Cloud 5
>>> > > > I created new collection in solr, configure mcf for this
>>> collection:
>>> > > > 'Connection working' but I cannot see any /update request from
mcf
>>> in
>>> > > > solr, only:
>>> > > >
>>> > > > INFO  - 2015-04-01 15:03:16.442;
>>> > > > org.apache.solr.update.DirectUpdateHandler2; start
>>> > > >
>>> > >
>>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>> > > > INFO  - 2015-04-01 15:03:16.444;
>>> > > > org.apache.solr.update.DirectUpdateHandler2; No uncommitted
>>> changes.
>>> > > > Skipping IW.commit.
>>> > > > INFO  - 2015-04-01 15:03:16.445; org.apache.solr.core.SolrCore;
>>> > > > SolrIndexSearcher has not changed - not re-opening:
>>> > > > org.apache.solr.search.SolrIndexSearcher
>>> > > > INFO  - 2015-04-01 15:03:16.445;
>>> > > > org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>> > > > INFO  - 2015-04-01 15:03:16.445;
>>> > > > org.apache.solr.update.processor.LogUpdateProcessor;
>>> > > > [dysk_shard1_replica1] webapp=/solr path=/update
>>> > > >
>>> > >
>>> params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&waitSearcher=true&openS
>>> > > > earcher=true&commit=true&softCommit=false&distrib.from=
>>> > >
>>> http://10.26.26.29:8983/solr/dysk_shard2_replica1/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false
>>> > > }
>>> > > > {commit=} 0 3
>>> > > > INFO  - 2015-04-01 15:03:16.448;
>>> > > > org.apache.solr.update.DirectUpdateHandler2; start
>>> > > >
>>> > >
>>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>> > > > INFO  - 2015-04-01 15:03:16.449;
>>> > > > org.apache.solr.update.DirectUpdateHandler2; No uncommitted
>>> changes.
>>> > > > Skipping IW.commit.
>>> > > > INFO  - 2015-04-01 15:03:16.449; org.apache.solr.core.SolrCore;
>>> > > > SolrIndexSearcher has not changed - not re-opening:
>>> > > > org.apache.solr.search.SolrIndexSearcher
>>> > > > INFO  - 2015-04-01 15:03:16.450;
>>> > > > org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>> > > > INFO  - 2015-04-01 15:03:16.450;
>>> > > > org.apache.solr.update.processor.LogUpdateProcessor;
>>> > > > [dysk_shard2_replica1] webapp=/solr path=/update
>>> > > >
>>> > >
>>> params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&waitSearcher=true&openS
>>> > > > earcher=true&commit=true&softCommit=false&distrib.from=
>>> > >
>>> http://10.26.26.29:8983/solr/dysk_shard2_replica1/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false
>>> > > }
>>> > > > {commit=} 0 2
>>> > > > INFO  - 2015-04-01 15:03:16.456;
>>> > > > org.apache.solr.update.processor.LogUpdateProcessor;
>>> > > > [dysk_shard2_replica1] webapp=/solr path=/update/extract
>>> > > > params={commit=true&wt=javabin&version=2} {commit=} 0
21
>>> > > >
>>> > > > K
>>> > > >
>>> > > > On Wed, Apr 01, 2015 at 10:53:39AM -0400, Karl Wright wrote:
>>> > > > > "When I put 'esci' as collection name I get a error.
>>> > > > > When I put 'collection1' I get 'Connection working' and no
>>> errors in
>>> > > logs
>>> > > > > but
>>> > > > > still no docs in solr."
>>> > > > >
>>> > > > > Hi Kamil,
>>> > > > > Do you get the exception when you use "collection1" as the
>>> collection
>>> > > > > name?  If not, then here's what I recommend:
>>> > > > >
>>> > > > > (1) Look at the Solr logs.  There should be an INFO message
for
>>> each
>>> > > > > document posted.  There is a URL in the message, and a document
>>> > > length, and
>>> > > > > a result.  It would be great if you could include a couple
of
>>> these
>>> > > for us
>>> > > > > to look at.
>>> > > > >
>>> > > > > (2) If there are any exceptions etc. in the Solr logs, please
>>> send
>>> > > those
>>> > > > > along as well.
>>> > > > >
>>> > > > > Offhand, this sounds like documents get posted properly but
then
>>> > > ignored by
>>> > > > > Solr.  There are a lot of potential reasons why that could
be
>>> the case.
>>> > > > > But if the documents are getting ignored, or if Tika is not
>>> > > successfully
>>> > > > > extracting data, then we should be able to figure out why
based
>>> on the
>>> > > Solr
>>> > > > > logs.
>>> > > > >
>>> > > > > Thanks,
>>> > > > > Karl
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Wed, Apr 1, 2015 at 10:39 AM, Kamil Żyta <
>>> kamil.zyta@pwr.edu.pl>
>>> > > wrote:
>>> > > > >
>>> > > > > > Ok, see my first mail. When I put 'esci' as collection
name I
>>> get a
>>> > > error.
>>> > > > > > When I put 'collection1' I get 'Connection working'
and no
>>> errors in
>>> > > logs
>>> > > > > > but
>>> > > > > > still no docs in solr.
>>> > > > > >
>>> > > > > > K
>>> > > > > >
>>> > > > > > On Wed, Apr 01, 2015 at 10:27:50AM -0400, Karl Wright
wrote:
>>> > > > > > > Hi Kamil,
>>> > > > > > >
>>> > > > > > > This is happening on the commit.  It looks to me
like it's
>>> because
>>> > > you
>>> > > > > > are
>>> > > > > > > specifying a collection that doesn't actually exist:
>>> > > > > > >
>>> > > > > > > >>>>>>
>>> > > > > > >     DocCollection col = getDocCollection(clusterState,
>>> collection);
>>> > > > > > >
>>> > > > > > >     DocRouter router = col.getRouter();
>>> > > > > > > <<<<<<
>>> > > > > > >
>>> > > > > > > It's complaining because "col" is coming back null.
>>> > > > > > >
>>> > > > > > > Karl
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Wed, Apr 1, 2015 at 10:19 AM, Kamil Żyta <
>>> kamil.zyta@pwr.edu.pl
>>> > > >
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > ERROR 2015-04-01 16:09:24,032 (Job notification
thread) -
>>> > > Unhandled
>>> > > > > > > > SolrServerException: java.lang.NullPointerException
>>> > > > > > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> > > Unhandled
>>> > > > > > > > SolrServerException: java.lang.NullPointerException
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:364)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.commitPost(HttpPoster.java:308)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.manifoldcf.agents.output.solr.SolrConnector.noteJobComplete(SolrConnector.java:610)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:121)
>>> > > > > > > > Caused by:
>>> org.apache.solr.client.solrj.SolrServerException:
>>> > > > > > > > java.lang.NullPointerException
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:873)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:738)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1372)
>>> > > > > > > > Caused by: java.lang.NullPointerException
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:520)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:892)
>>> > > > > > > >         at
>>> > > > > > > >
>>> > > > > >
>>> > >
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:795)
>>> > > > > > > >         ... 3 more
>>> > > > > > > >
>>> > > > > > > > K
>>> > > > > > > >
>>> > > > > > > > On Wed, Apr 01, 2015 at 10:15:13AM -0400,
Karl Wright
>>> wrote:
>>> > > > > > > > > Hi Kamil,
>>> > > > > > > > >
>>> > > > > > > > > So you are still seeing a NullPointerException
from
>>> > > > > > > > > org.apache.solr.client.solrj.impl.CloudSolrClient?
 Can
>>> you
>>> > > provide
>>> > > > > > the
>>> > > > > > > > > entire stack trace?
>>> > > > > > > > >
>>> > > > > > > > > Karl
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > On Wed, Apr 1, 2015 at 10:10 AM, Kamil
Żyta <
>>> > > kamil.zyta@pwr.edu.pl>
>>> > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > Hi Karl,
>>> > > > > > > > > > same thing with trunk. Any advice?
>>> > > > > > > > > >
>>> > > > > > > > > > K
>>> > > > > > > > > >
>>> > > > > > > > > > On Wed, Apr 01, 2015 at 09:37:47AM
-0400, Karl Wright
>>> wrote:
>>> > > > > > > > > > > Hi Kamil,
>>> > > > > > > > > > >
>>> > > > > > > > > > > Solrj 5.0 changed massively
from Solrj 4.x.  The
>>> work to
>>> > > use
>>> > > > > > Solrj
>>> > > > > > > > 5.0
>>> > > > > > > > > > has
>>> > > > > > > > > > > been done on trunk.  You will
need to check out and
>>> build
>>> > > trunk
>>> > > > > > in
>>> > > > > > > > order
>>> > > > > > > > > > to
>>> > > > > > > > > > > use Solr 5.
>>> > > > > > > > > > >
>>> > > > > > > > > > > Thanks,
>>> > > > > > > > > > > Karl
>>> > > > > > > > > > >
>>> > > > > > > > > > > On Wed, Apr 1, 2015 at 9:23
AM, Kamil Żyta <
>>> > > > > > kamil.zyta@pwr.edu.pl>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > > >
>>> > > > > > > > > > > > Hi,
>>> > > > > > > > > > > > I set up solr 5 (Cloud)
and mcf2, created core in
>>> solr
>>> > > with 2
>>> > > > > > > > shards
>>> > > > > > > > > > and 2
>>> > > > > > > > > > > > replicas:
>>> > > > > > > > > > > > https://i.imgur.com/M05QTu7.png
and created Output
>>> > > > > > Connections in
>>> > > > > > > > mcf.
>>> > > > > > > > > > > > When I put 'esci' in 'Collection
name' I got error:
>>> > > > > > > > > > > > Threw exception: 'Unhandled
SolrServerException:
>>> No live
>>> > > > > > > > SolrServers
>>> > > > > > > > > > > > available to handle this
request:[
>>> > > > > > > > http://10.26.26.29:8983/solr/esci,
>>> > > > > > > > > > > > http://10.26.26.28:8983/solr/esci]'
>>> > > > > > > > > > > > When I leave 'Collection
name' empty I have
>>> 'Connection
>>> > > > > > working'.
>>> > > > > > > > > > > > Now when I start job,
everything look good, worker
>>> fetch
>>> > > docs,
>>> > > > > > etc
>>> > > > > > > > > > > > but I cannot see any docs
in solr. Nothing in logs
>>> > > except one
>>> > > > > > line
>>> > > > > > > > in
>>> > > > > > > > > > > > worker
>>> > > > > > > > > > > > console:
>>> > > > > > > > > > > > [Thread-6476596] ERROR
>>> > > > > > > > > > org.apache.solr.client.solrj.impl.CloudSolrClient
-
>>> > > > > > > > > > > > Request to collection
 failed due to (0)
>>> > > > > > > > > > java.lang.NullPointerException,
>>> > > > > > > > > > > > retry? 0
>>> > > > > > > > > > > > thanks for the advice.
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > K
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > >
>>>
>>
>>
>

Mime
View raw message