manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Silvia, Daniel [USA]" <Silvia_Dan...@bah.com>
Subject RE: ManifoldCF's dist/shapoint-integration dir
Date Mon, 13 Feb 2012 14:51:59 GMT
Hi Karl

Does the SharePoint connector only pull files from the SharePoint instance and not content
like Wiki content. As mentioned in the previous e-mail I am able to see the xml content in
the log file for the wikis with the element similar to <someWiki><someNameWiki_row>some
other elements<WikiFiled>content.....</WikiField></someNameWiki_row></someWiki>.
However, I do not see information in the Simple History Report pulling Wiki information or
the .aspx pages. Does this report only produce information on files and not content pulled
from SharePoint?

I am just trying to figure out if I need to configure another connector to pull content from
SharePoint other than the SharePoint connector.


Thanks

Dan
________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Sunday, February 12, 2012 12:08 PM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

Hi Daniel,

If you are seeing fetches in the Simple History that include the wiki
URLs you are trying to capture, the SharePoint job is likely correct.
Are you seeing "Document ingest" activities for the same documents?
If so, they are being sent to Solr, and you'd have to look into the
Solr configuration to figure out why they aren't being indexed.

Thanks,


On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
<Silvia_Daniel@bah.com> wrote:
> Hi Karl
>
> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>
> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr.
I am able to see the Wikis in the logging where the SharePoint Connector pulls everything
from site, however, I do not see the Wikis content in the solr instance. When creating a job
to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to
configure the solr metadata in the job to capture "WikiField" element in the xml being passed
to the Solr connector?
>
> Thanks for your help.
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 10:52 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> It's been a while since I've set up a SharePoint job but I think what
> you are missing is a file rule (instead of just a library rule).
> Here's what the end-user documentation says on the matter:
>
> "Each rule consists of a path, a rule type, and an action. The actions
> are "Include" and "Exclude". The rule type tells the connection what
> kind of SharePoint entity it is allowed to exactly match. For example,
> a "File" rule will only exactly match SharePoint paths that represent
> files - it cannot exactly match sites or libraries. The path itself is
> just a sequence of characters, where the "*" character has the special
> meaning of being able to match any number of any kind of characters,
> and the "?" character matches exactly one character of any kind.
>
> The rule matcher extends strict, exact matching by introducing a
> concept of implicit inclusion rules. If your rule action is "Include",
> and you specify (say) a "File" rule, the matcher presumes implicit
> inclusion rules for the corresponding site and library. So, if you
> create an "Include File" rule that matches (for example)
> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
> for "/MySite", and an implied "Library Include" rule for
> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
> rule, there is an implied "Site Include" rule that corresponds to it.
> Note that these shortcuts only applies to "Include" rules - there are
> no corresponding implied "Exclude" rules."
>
> What this means is that you should probably be declaring file rules
> with "*" as the file name for each library, rather than a library
> rule.  You might want to just try this.  If you still have trouble,
> you can try setting the "org.apache.manifoldcf.connectors" property to
> "DEBUG" in the properties.xml file and restarting ManifoldCF before
> your next crawl.  The manifoldcf.log file will then have output
> describing the decisions the SharePoint connector made about each
> site, library, file, or folder it encountered.
>
> Thanks,
> Karl
>
> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
> <Silvia_Daniel@bah.com> wrote:
>> Hi Karl
>>
>> The Path Rules are :
>>
>> Path Match: /Shared Documents
>> Type: library
>> Action: include
>>
>> Path Match: /IDD/Shared Documents
>> Type: library
>> Action: include
>>
>> Path Match: /IDD/Documents
>> Type: library
>> Action: include
>>
>> Path Match: /manifoldcf/Shared Documents
>> Type: library
>> Action: include
>>
>> I hope this helps.
>>
>> I really appreciate your help.
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 10:01 AM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> "When I select only the fetch activity, I don't see anything in the
>> events, when I select the Document Ingest activity, I don't see
>> anything in the events."
>>
>> So either you've already run the job and the documents were accessed
>> the first time (and won't be accessed again until they change), or the
>> problem is likely that your SharePoint Path Rules are not including
>> any documents.  It would be very helpful at this point to include a
>> screen shot of the job you've created.  Since you are not on the net,
>> perhaps you can jot down your SharePoint path rules for me to have a
>> look at, as they are displayed when you view the job.
>>
>> Thanks,
>> Karl
>>
>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>> <Silvia_Daniel@bah.com> wrote:
>>> Hi Karl
>>>
>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>
>>> I see the Events. If all the  Activities in the Simple History Report, Document
Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start
job and end job for events . When I get to the Simple History Report I can select the "Connection",
I don't have an option to select the Activities I run the report first.
>>> When I select only the fetch activity, I don't see anything in the events, when
I select the Document Ingest activity, I don't see anything in the events.
>>>
>>> My solr output connection has the following information:
>>> Protocol: http
>>> Server: "the server name"
>>> Port:8080 (we are running solr on Jboss port 8080)
>>> Web Application Name: solr
>>> Core Name: collection1
>>> Update Handler: update/extract
>>> Remove Handler: /update
>>> Status Handler: /admin/ping
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> Ok, let's do one thing at a time.
>>>
>>> First:
>>>
>>> "For the Path tab where there are Path Rules, are these the paths we
>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>> you indicate for each "Site" and "Library" you want to include
>>> specific metadata or include all metadata?"
>>>
>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>> Rules describe what documents you want to include or exclude.  The
>>> Metadata Rules describe what metadata you want to include or exclude.
>>> For right now I would ignore the Metadata Rules and just make sure you
>>> have Path Rules that mean that you have included documents.
>>>
>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>> where the numbers change under the "Active" column as well as the
>>> "Document" and "Processed" column (these just get larger, where Active
>>> changes). "
>>>
>>> This "report" we actually call the Job Status screen.  The fact that
>>> the numbers get larger and the job doesn't just end indicates that you
>>> are successfully crawling your SharePoint, and you have set up the job
>>> to include at least some documents.  This is good news.  However, this
>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>> to that report, click on the "Simple History" link on the left-hand
>>> navigation area.  This report will show the events of your choice
>>> (default - ALL recorded events) over a given time window (default: the
>>> last hour).  If you've done this right you should at least see a "Job
>>> start" event.  The events you are most interested in are the "fetch"
>>> (which describes all attempts to fetch documents from SharePoint) and
>>> "document ingest", which describe attempts to get documents into Solr.
>>>  You can refresh the displayed events by clicking the "Go" button in
>>> the middle of the screen whenever you wish.
>>>
>>> I'd like you to delete your job, create it again, and start it.  Then,
>>> while it is running, I'd like you to go to the "Simple History"
>>> screen, and select the appropriate connection (your SharePoint
>>> repository connection), and click the "Go" button.  So as not to skip
>>> anything basic:
>>>
>>> (1) What event types do you see?
>>> (2) Are there "fetch" events?
>>> (3) Are there "document ingest" events?
>>>
>>> If you see no "fetch" events, that implies you have either not
>>> specified any documents to include in your job, OR your Solr
>>> connection is configured to reject too many document types so they are
>>> all getting filtered out.
>>>
>>> If you see "document ingest" events, but those have errors, it implies
>>> that the configuration of your Solr connection is incorrect and does
>>> not match the way your Solr is configured.  If you send me a specific
>>> error code and/or text I can help you figure out what is happening.
>>>
>>> If you see "document ingest" events with NO errors, but the Solr
>>> instance is not getting documents, you are describing an impossible
>>> situation.  While your Solr instance may not be configured to have the
>>> Extracting Update Handler active, or it may be at a different URL than
>>> what you pointed at, that would definitely yield errors or
>>> notifications in the Simple History.
>>>
>>> Please let me know what you actually see.
>>> Karl
>>>
>>>
>>>
>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>> <Silvia_Daniel@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> I am trying to figure out why I can't see anything being indexed into our
Solr index. I was looking at another post where you were working with "Martijn" and that individual
was not able to see info getting into Solr. In the report  that I have set up, I have included
all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping,
I am associating metadata fields that are indicated in the MetaData tab to fields that exist
in our solr index.
>>>>
>>>> For the Path tab where there are Path Rules, are these the paths we want
ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And
in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you
want to include specific metadata or include all metadata?
>>>>
>>>> As I run the report, I see "Documents", "Active, and "Processed" where the
numbers change under the "Active" column as well as the "Document" and "Processed" column
(these just get larger, where Active changes). While I was researching why I may not be seeing
something over on the Solr side, I saw your communication with another individual indicating
that I should see something like literal.xxx=yyy in the Solr log. This is an older post so
there maybe something else I should see. But the only thing I see when I look at the Solr
log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>
>>>> Any ideas.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> The default time range for the Simple History is the last hour.  I
>>>> suspect you are unaware of that.  If you want a different time range
>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>
>>>> Karl
>>>>
>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>> <Silvia_Daniel@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> I am looking at the Simple History in the UI and there isn't much to
see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier,
Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the
documentation and we should be getting something in those fields, I believe.
>>>>>
>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>> what the request and response look like from Solr.  I was talking
>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>> all requests against the Extracting Update Handler are logged to
>>>>> standard error, so you will see them appear in the process window in
>>>>> which Solr is running.
>>>>>
>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>> the box, but when you try to deploy any other way you are often
>>>>> missing the jar that contains the extracting update handler, so of
>>>>> course nothing works.  Several people on the connectors-user list have
>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>> and click through to the mailing list page and there are links at the
>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>> what is wrong and how to fix it.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>> Yea,but for some reason the logging isn't coming through. The logging
is set for info and I will have to change the logging level to DEBUG.
>>>>>>
>>>>>> Thanks again for your help.
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>> that are made and what the arguments were.  Also, this is the kind
of
>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>> end-user documentation I pointed you at before describes some of
this
>>>>>> but the Solr connector has grown beyond the doc to some extent at
this
>>>>>> point.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> Is there a log level other than  Wire-level debugging to view
log staements for trying to send output to a Solr instance in the Jobs List/Creation section?
We are having an issue getting content to Solr. Is there a document anywhere which defines
the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData
tabs?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>> configurable but it's not clear what the SharePoint web services
need
>>>>>>> to do under the hood, so anything I tell you would be just a
guess.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> One more question. Do you know the minimum permissions needed
to crawl the Sharepoint instance and all sites under the instance? The individual who set
my permissions set me up as the "site collection admin" for the top most site. Is there a
specific admin role without setting the user crawling the sharpoint instance other than "Farm
Admin"?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things
work
>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> (1) was the issue. When requesting access to the SharePoint
instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was
on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>
>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint
version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>
>>>>>>>>> Appreciate the help.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> The error message "axisFault=Server, detail=Server was
unable to
>>>>>>>>> process request --> Requested Registry access is not
allowed" is Axis
>>>>>>>>> interpreting an error message from SharePoint.  What
it is saying is
>>>>>>>>> that the user you are trying to crawl with is unable
to read the
>>>>>>>>> SharePoint machine's registry but needs to.  There are
two possible
>>>>>>>>> causes for this:
>>>>>>>>>
>>>>>>>>> (1) The user you gave doesn't have enough permissions
to crawl SharePoint
>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin,
you
>>>>>>>>> installed it logged in as a user that did not enough
permissions to do
>>>>>>>>> what it needs to do.
>>>>>>>>>
>>>>>>>>> You can tell the difference between the two by selecting
"SharePoint
>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection
saved in
>>>>>>>>> this way says "Connection working", it means that the
MCPermissions
>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> When I try to use option (1) and don't put anything
in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process
request --> Requested Registry access is not allowed" and when I put a "/" in the site
filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>
>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint
instance is not configured properly for us to crawl?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration
dir
>>>>>>>>>>
>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>
>>>>>>>>>> (1) the root site, which can be reached by the path
http://server:port
>>>>>>>>>> (2) a number of sites under the 'virtual path', with
URLs of the form:
>>>>>>>>>>
>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>
>>>>>>>>>> The "something" is, by default, the string "site",
so
>>>>>>>>>> http://server:port/site/xyz might be the URL of one
such virtual site.
>>>>>>>>>>
>>>>>>>>>> The form of the "site" field in the SharePoint connection
for the
>>>>>>>>>> first is either blank or "/" (can't remember which
right now), and the
>>>>>>>>>> form of the "site" field for the second is "/site/xyz".
 On no account
>>>>>>>>>> does the connector expect to see default.aspx attached
to that path,
>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>
>>>>>>>>>> FWIW, my recommendation to try setting the connection
type to
>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation
issue with
>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection
check for 2.0 does
>>>>>>>>>> not look for it; only the connection check for 3.0
does.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized:
Access is denied due to server configuration" when setting the Site field to /default.aspx.
Do most Sharepoint instances have the urls set to something like http://server:port/sites/......
instead of http://server:port/? When I use the "/default.aspx" I see in the log files that
ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx,
where nothing is found.
>>>>>>>>>>>
>>>>>>>>>>> As you can tell I am not much of a SharePoint
user or installer.
>>>>>>>>>>>
>>>>>>>>>>> Also, I don't think the issue is with the connector
in ManifoldCF, I am just trying to
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration
dir
>>>>>>>>>>>
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> The issue I am having is that the Sharepoint
instance url is something like http://server:port/default.aspx. If I don't put anything in
the site field I get a message indicating "Requested Registry Access is not allowed". I was
putting "/default.apsx" as my Site field which I believe may have been the issue. However,
what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>
>>>>>>>>>>> I would love to send you the log messages, but
I am working on a network which is not connected to the outside.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration
dir
>>>>>>>>>>>
>>>>>>>>>>> Daniel,
>>>>>>>>>>>
>>>>>>>>>>> FWIW, I can help you diagnose the issue, but
to do so you really need
>>>>>>>>>>> to give me some concrete data.  I'm happy to
grovel over the whole
>>>>>>>>>>> wire log if you feel you can send it to me; something
that may not
>>>>>>>>>>> seem important to you will likely stand out strongly
to me.  I can,
>>>>>>>>>>> for example, see whether you are getting back
HTML because of an
>>>>>>>>>>> authentication error, for instance.  And if you
ARE getting back valid
>>>>>>>>>>> SOAP, I would then be sure that something was
wrong with the Axis
>>>>>>>>>>> client configuration, and I could pursue that
here with the data
>>>>>>>>>>> provided.  The problem with software like SharePoint
running on IIS is
>>>>>>>>>>> that it can be configured a nearly infinite number
of ways, so
>>>>>>>>>>> diagnosis is more of an art than a science. 
I strongly suspect that
>>>>>>>>>>> you're laboring under a pretty straightforward
misconception which is
>>>>>>>>>>> likely blocking progress, rather than there being
an issue with the
>>>>>>>>>>> SharePoint connector itself.  But I can't tell
that without more
>>>>>>>>>>> detailed communication.
>>>>>>>>>>>
>>>>>>>>>>> Also, you mentioned that the Lists.asmx service
was right where you
>>>>>>>>>>> expected it to be.  Have you read the SharePoint
Connector part of the
>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>
>>>>>>>>>>> "Select the server protocol, and enter the server
name and port, based
>>>>>>>>>>> on what you recorded from the URL for your SharePoint
site. For the
>>>>>>>>>>> "Site path" field, type in the portion of the
root site URL that
>>>>>>>>>>> includes everything after the server and port,
except for the final
>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL
is
>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp",
the site path would be
>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service
in this example would be
>>>>>>>>>>> expected to be found at
>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".
 And the URL
>>>>>>>>>>> you would start with would be the URL you see
in the browser when you
>>>>>>>>>>> log into the SharePoint web client and go to
the site you wish to
>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks again,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright
<daddywri@gmail.com> wrote:
>>>>>>>>>>>> The code that parses the SOAP response is
Apache Axis.  This hasn't
>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) When the SharePoint connector makes a
request to SharePoint, is
>>>>>>>>>>>> the response HTML, or is it XML?  Does it
have an XML header which
>>>>>>>>>>>> describes a Microsoft XML namespace?  It
sure sounds like it is
>>>>>>>>>>>> responding with HTML.  The SharePoint connector
is expecting to
>>>>>>>>>>>> communicate using SOAP.  Is the response
valid SOAP?
>>>>>>>>>>>>
>>>>>>>>>>>> (2) What version of SharePoint are you trying
to connect to?  Is the
>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia,
Daniel [USA]
>>>>>>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have added the specific log4j lines
for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice
Lists.asmx through IE. When reviewing the log files I was able to see some of the content
that resides in the Sharepoint instance in the content coming back from the request. However,
I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating
 "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection
is available".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could there be something going on with
the way the services are being built on the client side?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52
PM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration
dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates
that the SOAP request the
>>>>>>>>>>>>> SharePoint connector is attempting to
perform is, in fact, returning
>>>>>>>>>>>>> an HTML response.  This usually indicates
that the server or path
>>>>>>>>>>>>> parameters you've used to set up the
connection are not set correctly,
>>>>>>>>>>>>> and SharePoint is not actually being
engaged.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But usually when that happens I don't
recall a ConfigurationException
>>>>>>>>>>>>> logged, unless it's what Axis does in
response to the HTML.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The best thing to do at this point is
turn on Http Client wire
>>>>>>>>>>>>> logging, restart ManifoldCF, and view
the connection.  The log will
>>>>>>>>>>>>> then contain a record of the exact SOAP
requests and the responses,
>>>>>>>>>>>>> and we can see what's wrong.  The technique
is described here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also confirm that the right SharePoint
web services are
>>>>>>>>>>>>> functioning on the machine in question
by trying to access them
>>>>>>>>>>>>> directly.  For the Lists web service,
which is the one it sounds like
>>>>>>>>>>>>> it was complaining about, try using IE
(not Firefox etc because you
>>>>>>>>>>>>> want NTLM support) to go to the url where
you think the web service
>>>>>>>>>>>>> lives.  This will be http: or https:,
plus the server, plus the port,
>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".
 You should see an
>>>>>>>>>>>>> unequivocable SharePoint response.  For
an example from the Microsoft
>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let me know how it goes, and cc
the dev list (as I have) so a
>>>>>>>>>>>>> record of what you're encountering can
be made available to others.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia,
Daniel [USA]
>>>>>>>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have downloaded the newest version
of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then
built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp
deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007).
When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF
front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file
I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap
is available".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you tell me if you have seen
this issue before and what may be causing this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31
AM
>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration
dir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In order for the SharePoint connector
to build, you need to have the
>>>>>>>>>>>>>> wsdls in place in the right area.
 We cannot ship those because of
>>>>>>>>>>>>>> potential copyright issues.  The
easiest way to obtain the right
>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating,
or trunk.
>>>>>>>>>>>>>> 0.4-incubating is still in the process
of being signed off by the
>>>>>>>>>>>>>> incubator, but you can find the release
candidate here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM,
Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Silvia_Daniel@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I work with Matt Parker and we
are in the process of developing a pipeline
>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning.
I just subscribed to the
>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>> group yesterday and submitted
an e-mail question to the group. Can you help
>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I downloaded MCF and started
playing with the default setup under Jetty and
>>>>>>>>>>>>>>> Derby. It starts up without any
issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>> connector, connecting to SharePoint
Service 3. I have been following the
>>>>>>>>>>>>>>> instructions and I am at the
point of deploying the custom SharePoint web
>>>>>>>>>>>>>>> service to the SharePoint instance.
The instructions indicate that I should
>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration
after building MCF.
>>>>>>>>>>>>>>> However, after looking through
the entire directory structure, I am unable
>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can someone tell me where to
find this service?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Daniel Silvia
Mime
View raw message