manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anupam Bhattacharya <anupam...@gmail.com>
Subject Re: Need Help on setting up ManifoldCF
Date Thu, 23 Feb 2012 19:04:39 GMT
Hello Karl,

Finally, I was able to index all the metadata for the defined document
types with different content types. Everything went well.
Although I was not able to index the file full text content. (like PDF,
XML). I read about SOLR Cell where using CURL we can upload documents but
unfortunately our XML files structure contains Tag & values which also
needs to be indexed.
e.g, some XML structure..

<doc>
<object_id>111</object_id>
<abstract>Abstract Text</abstract>
<citation>Citation Text</citation>
<publication>News Source</publication>
</doc>

I found that in SOLR if we add a new RequestHandler Code extending the
ExtractingRequestHandler we can parse the documents fetch information and
add it as index field in the SOLR index.

What is the ideal approach for indexing tag values from XML in lucene from
ManifoldCF to SOLR ? Is it necessary to integrate TIKA for this ?
I found a good post over here.. https://community.emc.com/docs/DOC-6520

Appreciate your advice on this.

Regards
Anupam




On Thu, Feb 16, 2012 at 12:17 AM, Karl Wright <daddywri@gmail.com> wrote:

> On Wed, Feb 15, 2012 at 1:13 PM, Anupam Bhattacharya
> <anupamb82@gmail.com> wrote:
> > Hello Karl,
> >
> > Thanks for adding this to the JIRA system.
> >
> > The dfc.properties was introduced from Documentum 6.0 version onwards &
> as
> > per manifoldcf connector documentation
> > (http://incubator.apache.org/connectors/en_US/included-connectors.html)
> the
> > out-of the box connector classes were tested against DFC 5.3 SP5 which
> > needed the dmcl.ini file. Thus run.bat must have been configured properly
> > for that dmcl.ini.
>
> Right - so does DFC 6.0 on Windows require the DOCUMENTUM environment
> variable to be set to point at the directory where dfc.properties is
> found?  Or perhaps it doesn't require the DOCUMENTUM environment
> variable at all anymore?
>
> >
> > As I am trying to connect to DFC 6.5 SP3 version i need to look for
> > dfc.properties file. I hope the out-of the box documentum connector will
> > work with 6.5 version.
>
> It was tried and worked.  The script was developed later with only the
> 5.3 version available.
>
> >
> > I am confused, why for all connector we have Client & Server version ?
> Can
> > you please explain.
> >
>
> Do you mean "why is there a documentum-connector-server" process?  If
> that's the question, it was created for two reasons:
> (1) We had problems with stability of DFC.  It segfaults occasionally,
> somewhere in its native code.  We did not want that to bring down
> ManifoldCF, and we wanted to be able to restart the part of the
> connector that depended on DFC transparently when it crashed.
> (2) DFC has dependencies on many older open-source jars that conflict
> with the rest of ManifoldCF.  If (1) was not a problem we might have
> used a classloader to fix this, but since we had to fix both we
> created a separate process.
>
> FWIW, we do the same thing for FileNet because of its dependency on Wasp.
>
> Karl
>
> > Again, Thanks for all the help.
> >
> > Regards
> > Anupam
> >
> >
> > On Wed, Feb 15, 2012 at 8:42 PM, Karl Wright <daddywri@gmail.com> wrote:
> >>
> >> Hi Anupam,
> >>
> >> I did not see a ticket from you about the DOCUMENTUM environment
> >> variable and the dmcl.ini vs. dfc.properties file.  I've created an
> >> issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to track
> >> this problem.  It would be great if you could confirm that: (a) the
> >> DOCUMENTUM environment variable is still needed at all by DFC, and (b)
> >> that when it is set properly, the file dfc.properties can be found at
> >> $DOCUMENTUM\dfc.properties (on Windows, at least).
> >>
> >> Thanks,
> >> Karl
> >>
> >> On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <daddywri@gmail.com>
> wrote:
> >> > Hi Anupam,
> >> >
> >> > Please post emails like this directly to
> >> > connectors-user@incubator.apache.org.  See below for responses.
> >> >
> >> > On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
> >> > <anupamb82@gmail.com> wrote:
> >> >>
> >> >> Hello Karl,
> >> >>
> >> >> I am a software programmer in DuPont, Gurgaon, India. Recently, due
> to
> >> >> the
> >> >> economic instability all over the world the company has decided to
go
> >> >> for
> >> >> cheaper Search Engine Applications. Thus we are getting rid of many
> >> >> costly
> >> >> proprietary Search Applications and will be replacing with FAST.
> >> >>
> >> >> Although, I recently came across SOLR search engine & ManiFoldCF
> >> >> Connector
> >> >> framework. Thus, I am currently driving this effort within my company
> >> >> as i
> >> >> am a big supporter of open source technologies. I started my career
> in
> >> >> Alfresco CMS and now working on Search Technologies.
> >> >>
> >> >> Currently I am facing lots of initial building/deploying/installing
> >> >> issues.
> >> >> I have already referred the url
> >> >>
> >> >>
> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
> >> >> Read it multiple times but still face many issues. I downloaded the
> >> >> latest
> >> >> 0.4 version and it seems the documentation is not up to date on the
> >> >> above
> >> >> link.
> >> >>
> >> >
> >> > The online documentation is pertinent to trunk.  The documentation you
> >> > want to use is contained within the 0.4-incubating release.  Go to
> >> > dist/doc and you will see it there.
> >> >
> >> >> Few issues which took me a long time to resolve which can be added
in
> >> >> ManifoldCF wiki as learnings for others are listed below:
> >> >> a. No single example is given for running the executecommand.bat with
> >> >> proper
> >> >> arguments. Only list of commands given with parameter defined.
> >> >
> >> > I'm not entirely sure I get this.  Do you just want an example in the
> >> > documentation?
> >> >
> >> >> b. Setting where and which file for the property
> manifoldcf.configfile
> >> >> for deploying the war on tomcat with Postgresql database.
> >> >
> >> > The documentation already tells you that you need to add an
> >> > appropriate -D to your tomcat invocation to point to your
> >> > properties.xml file.  Tomcat documentation differs from version to
> >> > version and platform to platform on how best to do that, and if you
> >> > run under Windows there's even a service wrapper with a configuration
> >> > UI that allows you to set these parameters.  So it's way beyond
> >> > ManifoldCF's mission to describe all that, I think.
> >> >
> >> >> c. I am trying to build the Documentum Connector but came to know
> that
> >> >> some
> >> >> additional environment variables needs to be added for "DOCUMENTUM".
> >> >> Additionally the latest version of documentum uses dfc.properties
> file
> >> >> while
> >> >> run.bat look for dctl.ini file.
> >> >
> >> > Could you open a ticket in Jira for this issue?
> >> > https://issues.apache.org/jira. It should not be a problem if you
> >> > modify the script temporarily, but we can readily make the script look
> >> > for either of these.
> >> >
> >> >> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or
> >> >> above.
> >> >
> >> > We use JDK 6 all the time without problems, so I don't know what you
> >> > are talking about here.
> >> >
> >> >> e. I was getting errors during  the ant build which tries to delete
> jar
> >> >> files from lib directory. Don't have the source code right now with
> me
> >> >> thus
> >> >> cant provide the full path.
> >> >
> >> > It sounds like you were trying to run ant while you still had
> >> > ManifoldCF processes running from the same tree.
> >> >
> >> >> f. It was advised in the documentation to set MCF_Home for
> >> >> example_multiprocess project but it seems the build of documentum
> >> >> connector
> >> >> refers to this property differently from run.bat.
> >> >
> >> > Yes, this was noticed and fixed on trunk recently.
> >> >
> >> >>
> >> >> Can you please update the Apache ManifoldCF website with the latest
> >> >> installation procedures. Also, It will be very kind of you in the
> >> >> meanwhile
> >> >> if you can send few notes for me to head start the configuration of
> >> >> ManifoldCF, with SOLR & Documentum connector.
> >> >>
> >> >
> >> > The documentation online has been updated to be consistent with trunk,
> >> > so if you want to use the trunk version this might be a good
> >> > opportunity to help clarify the documentation.  Either that or you
> >> > will need to stick with the 0.4-incubating release and the
> >> > 0.4-incubating documentation that is part of it; we cannot at this
> >> > time update documentation that has already been released.
> >> >
> >> > Thanks,
> >> > Karl
> >> >
> >> >> Looking forward for your help.
> >> >>
> >> >> Thanks & Regards
> >> >> Anupam Bhattacharya
> >> >>
> >> >>
> >> >>
> >
> >
> >
> >
> > --
> > Thanks & Regards
> > Anupam Bhattacharya
> >
> >
>



-- 
Thanks & Regards
Anupam Bhattacharya

Mime
View raw message