manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Sharepoint connector question
Date Sun, 12 Sep 2010 21:02:45 GMT
I confirmed that without any mappings set, the Solr Connector *should* just
be passing the metadata through using the metadata's name as the Solr field
name.

For debugging, if you could post the Solr output from one update operation,
I'd love to see if any metadata seems to be in it.  Potentially it's there
but the Solr schema is not right somehow - that should be the first thing we
verify.

Karl


On Sun, Sep 12, 2010 at 4:50 PM, Martijn v Groningen <
martijn.is.hier@gmail.com> wrote:

> Tomorrow I'll dive into code and do some more debugging. Last week I
> didn't specify any mappings in the mapping tab for the meta data
> fields I selected in the metadata tab. But this shouldn't be the
> problem, right?
>
> Thanks,
>
> Martijn
>
> On 12 September 2010 22:29, Karl Wright <daddywri@gmail.com> wrote:
> > Martijn,
> >
> > (1) The precise svn url for the acf version of httpclient is as follows.
> My
> > apologies for any earlier confusion - I was away from my computer at the
> > time.
> >
> >
> https://svn.apache.org/repos/asf/incubator/lcf/upstream/commons-httpclient-3x
> >
> > (2) Each time the solr connector posts into Solr, you should see a set of
> > argument names and values dumped to standard out (or the log).  So it
> should
> > be easy to see what is being sent, and whether the arguments in fact are
> the
> > correct ones for the extracting update request handler, or not.
> > Furthermore, the Solr output connector recently had a tab added which
> > performs the mapping I alluded to.  This mapping is designed to translate
> > metadata coming from a connector like SharePoint, into fields that you
> > presumably have in your Solr schema.  However, if you don't set anything,
> > the fields are not changed, and you should see an argument for every
> > metadata field, something like: literal.xxx=yyy.
> >
> > If you have a document that you *know* has metadata, and you've specified
> > that metadata in the job, and you run the job after you specify that
> > metadata, but still see no literal.xxx=yyy corresponding to it in the
> Solr
> > output, then we should spend some time chasing this problem down.  Be
> wary
> > because incremental crawling means you'll probably not see your document
> > processed again unless you either change it in SharePoint, or delete and
> > recreate the job.  But be reassured that SharePoint metadata was covered
> by
> > the old MetaCarta tests, and there have been no changes of any
> significance
> > to the SharePoint connector since then, so I have no explanation why it
> > would not work for you too.  That's why I'm spending time trying to
> figure
> > out if this is a Solr connector issue instead.
> >
> > Please let me know if this helps you, or whether you need to go deeper
> into
> > debugging.
> >
> > Karl
> >
> >
> > On Sun, Sep 12, 2010 at 4:05 PM, Martijn v Groningen
> > <martijn.is.hier@gmail.com> wrote:
> >>
> >> I didn't notice that I was under the upstream-changes directory.
> >> Thanks for pointing that out.
> >>
> >> In Solr I have a wildcard (*) dynamic field, so everything acf sends
> >> should end up in my index (or at least that is what I assume). I also
> >> did some debugging in the Solr connecter and I noticed that no
> >> metadata was send to Solr. I didn't create field mappings in my acf
> >> job. Do you always have to make mapping for metadata?
> >>
> >> Martijn
> >>
> >> On 12 September 2010 21:09, Karl Wright <daddywri@gmail.com> wrote:
> >> > The source for upstream changes is under
> >> > lcf/upstream-changes/httpclient, not under trunk.
> >> >
> >> > As for the metadata, how are you determining that no metadata is being
> >> > indexed?  If this is Solr you are indexing into, have you set up the
> >> > appropriate metadata/field mappings?
> >> >
> >> > Karl
> >> >
> >> > On 9/12/10, Martijn v Groningen <martijn.is.hier@gmail.com> wrote:
> >> >> To authenticate with Share point I had to include the domain as well.
> >> >> Also the ui reported an error if I didn't specify the username in a
> >> >> domain / username format. Maybe this http client issue was just
> >> >> particular with the Sharepoint / Domain Controller installation I was
> >> >> working with. I also couldn't find the source of afc version of http
> >> >> client. Is it hosted in another source repository?
> >> >>
> >> >> I still don't understand why for the documents I crawled, I didn't
> >> >> have any metadata associated with it. In the job configuration I was
> >> >> able to choose which metadata I wanted to include. You have an idea
> >> >> what might be the cause of this?
> >> >>
> >> >> Regards,
> >> >>
> >> >> Martijn
> >> >>
> >> >> On 12 September 2010 18:40, Karl Wright <daddywri@gmail.com>
wrote:
> >> >>> Hi Martijn,
> >> >>>
> >> >>> The ACF version of httpclient has support for NTLMv1, NTLMv2, and
> >> >>> NTLM2
> >> >>> protocols.  The standard client does not.
> >> >>>
> >> >>> What this means practically for you depends on how the Windows
> domain
> >> >>> controller you are working with is configured.  You cannot use
the
> >> >>> off-the-shelf httpclient and still authenticate if the domain
> >> >>> controller
> >> >>> is
> >> >>> configured to not allow LM connections, which is what Microsoft
> >> >>> recommends
> >> >>> people do.
> >> >>>
> >> >>> Since the ACF version of httpclient will always try to connect
using
> >> >>> NTLMv2,
> >> >>> this means that you must be more rigorous about setting up your
> client
> >> >>> machine.  First, it must have a name, and it must have a machine
> >> >>> account
> >> >>> in
> >> >>> the domain.  Second, NTLMv2 is much more picky about how you specify
> >> >>> user
> >> >>> and domain.  The end user documentation provides details that may
be
> >> >>> helpful
> >> >>> to you in this regard.
> >> >>>
> >> >>> Thanks,
> >> >>> Karl
> >> >>>
> >> >>>
> >> >>> On Sun, Sep 12, 2010 at 5:00 AM, Martijn v Groningen
> >> >>> <martijn.is.hier@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi All,
> >> >>>>
> >> >>>> I've configured the Sharepoint connector (to connect to sharepoint
> >> >>>> 3.0), Solr connector and a job that adds documents into Solr.
The
> >> >>>> only
> >> >>>> thing that I'm missing is the meta data from Sharepoint. Per
> document
> >> >>>> I need to know which users can access it. In the metadata tab
on
> the
> >> >>>> job page I've configured the metadata to be included, but this
> >> >>>> doesn't
> >> >>>> end up in my Solr index. Does anybody know what I should do
to also
> >> >>>> have the metadata in my index?
> >> >>>>
> >> >>>> I also had another issue with the Sharepoint connector which
I
> >> >>>> managed
> >> >>>> to solve. But I'm curious to know if someone else encountered
a
> >> >>>> similar issue.
> >> >>>> When I was setting up the sharepoint connecter I always got
a 401
> >> >>>> message on the connectors page as status. I was sure I entered
the
> >> >>>> correct credentials. After some debugging I noticed that the
NLTM
> >> >>>> data
> >> >>>> that was send to Solr was different then when I did a http
post
> with
> >> >>>> Firefox poster plugin to a Sharepoint webservice url (I check
this
> >> >>>> with Wireshark). After writing a little test case with httpclient
> >> >>>> used
> >> >>>> in afc, I got the same 401 error. I then ran the test with
a clean
> >> >>>> http client (version 3.1), that ran as expected. I got a response
> >> >>>> code
> >> >>>> 200 back with a soap response. I then used this version of
http
> >> >>>> client
> >> >>>> (with some class filesfrom the afc provided jar that were missing
> is
> >> >>>> the plain jar file) and the connector worked as expected as
I was
> >> >>>> able
> >> >>>> to index documents. Did someone else have this particular issue?
I
> >> >>>> noticed that acf is using httpclient 3.1 (from the manifest
file),
> >> >>>> but
> >> >>>> I'm curious to know why http client was modified.
> >> >>>>
> >> >>>> BTW I've been using the latest trunk version (I did a checkout
last
> >> >>>> tuesday). I'm also new to Sharepoint
> >> >>>>
> >> >>>> Cheers,
> >> >>>>
> >> >>>> Martijn
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Met vriendelijke groet,
> >>
> >> Martijn van Groningen
> >
> >
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>

Mime
View raw message