incubator-connectors-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rid...@apache.org
Subject svn commit: r1210183 - in /incubator/lcf/trunk/site/src/documentation/content/xdocs: how-to-build-and-deploy.xml included-connectors.xml
Date Sun, 04 Dec 2011 19:16:09 GMT
Author: ridder
Date: Sun Dec  4 19:16:09 2011
New Revision: 1210183

URL: http://svn.apache.org/viewvc?rev=1210183&view=rev
Log:
CONNECTORS-272: Updated documentation regarding PostgreSQL 9.1

Modified:
    incubator/lcf/trunk/site/src/documentation/content/xdocs/how-to-build-and-deploy.xml
    incubator/lcf/trunk/site/src/documentation/content/xdocs/included-connectors.xml

Modified: incubator/lcf/trunk/site/src/documentation/content/xdocs/how-to-build-and-deploy.xml
URL: http://svn.apache.org/viewvc/incubator/lcf/trunk/site/src/documentation/content/xdocs/how-to-build-and-deploy.xml?rev=1210183&r1=1210182&r2=1210183&view=diff
==============================================================================
--- incubator/lcf/trunk/site/src/documentation/content/xdocs/how-to-build-and-deploy.xml (original)
+++ incubator/lcf/trunk/site/src/documentation/content/xdocs/how-to-build-and-deploy.xml Sun
Dec  4 19:16:09 2011
@@ -50,7 +50,7 @@
         <ul>
           <li>CMIS connector</li>
           <li>Filesystem connector</li>
-          <li>JDBC connector, with just the postgresql jdbc driver</li>
+          <li>JDBC connector, with just the PostgreSQL jdbc driver</li>
           <li>RSS connector</li>
           <li>Webcrawler connector</li>
         </ul>
@@ -279,7 +279,7 @@ cd dist/example
         <p></p>
         <p>You can stop the quick-start ManifoldCF at any time using ^C.</p>
         <p></p>
-        <p>Bear in mind that Derby is not as full-featured a database as is Postgresql.
 This means that any performance testing you may do against the quick start example may not
be applicable to a full installation.  Furthermore, Derby only permits one process at a time
to be connected to its databases, so you <strong>cannot</strong> use any of the
ManifoldCF commands (as described below) while the quick-start ManifoldCF is running.</p>
+        <p>Bear in mind that Derby is not as full-featured a database as is PostgreSQL.
 This means that any performance testing you may do against the quick start example may not
be applicable to a full installation.  Furthermore, Derby only permits one process at a time
to be connected to its databases, so you <strong>cannot</strong> use any of the
ManifoldCF commands (as described below) while the quick-start ManifoldCF is running.</p>
         <p></p>
         <p>Another caveat that you will need to be aware of with the quick-start version
of ManifoldCF is that it in no way removes the need for you to run any separate processes
that individual connectors require.  Specifically, the Documentum and FileNet connectors require
processes to be independently started in order to function.  You will need to read about these
connector-specific processes below in order to use the corresponding connectors.  However,
the Quick Start build does place the necessary jars, script, and defines in a set of <em>xxx-process</em>
directories right underneath the <em>dist/example</em> directory.</p>
         <p></p>
@@ -310,7 +310,7 @@ cd dist/example
         <p>The core part of ManifoldCF consists of several pieces.  These basic pieces
are enumerated below:</p>
         <p></p>
         <ul>
-           <li>A database, which is where ManifoldCF keeps all of its configuration
and state information, usually Postgresql</li>
+           <li>A database, which is where ManifoldCF keeps all of its configuration
and state information, usually PostgreSQL</li>
            <li>A synchronization directory, which how ManifoldCF coordinates activity
among its various processes</li>
            <li>An <strong>agents</strong> process, which is the process
that actually crawls documents and ingests them</li>
            <li>A <strong>crawler-ui</strong> web application, which presents
the UI users interact with to configure and control the crawler</li>
@@ -346,7 +346,7 @@ cd dist/example
         <p></p>
         <ul>
           <li>Check out and build, using "ant build".</li>
-          <li>Install postgresql.  The postgresql JDBC driver included with ManifoldCF
is known to work with version 8.4.x, so that version is the currently recommended one.  Configure
postgresql for your environment; the default configuration is acceptable for testing and experimentation.</li>
+          <li>Install PostgreSQL.  The PostgreSQL JDBC driver included with ManifoldCF
is known to work with version 9.1, so that version is the currently recommended one.  Configure
PostgreSQL for your environment; the default configuration is acceptable for testing and experimentation.</li>
           <li>Install a Java application server, such as Tomcat.</li>
           <li>Create a home directory for ManifoldCF.  To do this, make a copy of the
contents of <em>dist</em> from the build.  In this directory, create properties.xml
and logging.ini, as described above.  Note that you will also need to create a synchronization
directory, also detailed above, and refer to this directory within your properties.xml.</li>
           <li>Deploy the war files in <em>&#60;MCF_HOME&#62;/web/war</em>
to your application server.</li>
@@ -362,25 +362,25 @@ cd dist/example
         <p></p>
         <p></p>
         <section>
-          <title>Configuring the Postgresql database</title>
+          <title>Configuring the PostgreSQL database</title>
           <p></p>
-          <p>Despite having an internal architecture that cleanly abstracts from specific
database details, ManifoldCF is currently fairly specific to Postgresql at this time.  There
are a number of reasons for this.</p>
+          <p>Despite having an internal architecture that cleanly abstracts from specific
database details, ManifoldCF is currently fairly specific to PostgreSQL at this time.  There
are a number of reasons for this.</p>
           <p></p>
           <ul>
              <li>ManifoldCF uses the database for its document queue, which places
a significant load on it.  The back-end database is thus a significant factor in ManifoldCF's
performance.  But, in exchange, ManifoldCF benefits enormously from the underlying ACID properties
of the database.</li>
-             <li>The strategy for getting optimal query plans from the database is
not abstracted.  For example, Postgresql 8.3+ is very sensitive to certain statistics about
a database table, and will not generate a performant plan if the statistics are inaccurate
by even a little, in some cases.  So, for Postgresql, the database table must be analyzed
very frequently, to avoid catastrophically bad plans.  But luckily, Postgresql is pretty good
at doing analysis quickly.  Oracle, on the other hand, takes a very long time to perform analysis,
but its plans are much less sensitive.</li>
-             <li>Postgresql always does a sequential scan in order to count the number
of rows in a table, while other databases return this efficiently.  This has affected the
design of the ManifoldCF UI.</li>
-             <li>The choice of query form influences the query plan.  Ideally, this
is not true, but for both Postgresql and for (say) Oracle, it is.</li>
-             <li>Postgresql has a high degree of parallelism and lack of internal single-threadedness.</li>
+             <li>The strategy for getting optimal query plans from the database is
not abstracted.  For example, PostgreSQL 8.3+ is very sensitive to certain statistics about
a database table, and will not generate a performant plan if the statistics are inaccurate
by even a little, in some cases.  So, for PostgreSQL, the database table must be analyzed
very frequently, to avoid catastrophically bad plans.  But luckily, PostgreSQL is pretty good
at doing analysis quickly.  Oracle, on the other hand, takes a very long time to perform analysis,
but its plans are much less sensitive.</li>
+             <li>PostgreSQL always does a sequential scan in order to count the number
of rows in a table, while other databases return this efficiently.  This has affected the
design of the ManifoldCF UI.</li>
+             <li>The choice of query form influences the query plan.  Ideally, this
is not true, but for both PostgreSQL and for (say) Oracle, it is.</li>
+             <li>PostgreSQL has a high degree of parallelism and lack of internal single-threadedness.</li>
           </ul>
           <p></p>
-          <p>ManifoldCF has been tested against PostgreSQL 8.3.7 and PostgreSQL 8.4.5.
 We recommend the following configuration parameter settings to work optimally with ManifoldCF:</p>
+          <p>ManifoldCF has been tested against version 8.3.7, 8.4.5 and 9.1 of PostgreSQL.
 We recommend the following configuration parameter settings to work optimally with ManifoldCF:</p>
           <p></p>
           <ul>
              <li>A default database encoding of UTF-8</li>
              <li><em>postgresql.conf</em> settings as described in the
table below</li>
              <li><em>pg_hba.conf</em> settings to allow password access
for TCP/IP connections from ManifoldCF</li>
-             <li>A maintenance strategy involving cronjob-style vacuuming, rather than
Postgresql autovacuum</li>
+             <li>A maintenance strategy involving cronjob-style vacuuming, rather than
PostgreSQL autovacuum</li>
           </ul>
           <p></p>
           <table>
@@ -402,11 +402,11 @@ cd dist/example
         <section>
           <title>A note about maintenance</title>
           <p></p>
-          <p>Postgresql's architecture causes it to accumulate dead tuples in its data
files, which do not interfere with its performance but do bloat the database over time.  The
usage pattern of ManifoldCF is such that it can cause significant bloat to occur to the underlying
Postgresql database in only a few days, under sufficient load.  Postgresql has a feature to
address this bloat, called <strong>vacuuming</strong>.  This comes in three varieties:
autovacuum, manual vacuum, and manual full vacuum.</p>
+          <p>PostgreSQL's architecture causes it to accumulate dead tuples in its data
files, which do not interfere with its performance but do bloat the database over time.  The
usage pattern of ManifoldCF is such that it can cause significant bloat to occur to the underlying
PostgreSQL database in only a few days, under sufficient load.  PostgreSQL has a feature to
address this bloat, called <strong>vacuuming</strong>.  This comes in three varieties:
autovacuum, manual vacuum, and manual full vacuum.</p>
           <p></p>
-          <p>We have found that Postgresql's autovacuum feature is inadequate under
such conditions, because it not only fights for database resources pretty much all the time,
but it falls further and further behind as well.  Postgresql's in-place manual vacuum functionality
is a bit better, but is still much, much slower than actually making a new copy of the database
files, which is what happens when a manual full vacuum is performed.</p>
+          <p>We have found that PostgreSQL's autovacuum feature is inadequate under
such conditions, because it not only fights for database resources pretty much all the time,
but it falls further and further behind as well.  PostgreSQL's in-place manual vacuum functionality
is a bit better, but is still much, much slower than actually making a new copy of the database
files, which is what happens when a manual full vacuum is performed.</p>
           <p></p>
-          <p>Dead-tuple bloat also occurs in indexes in Postgresql, so tables that
have had a lot of activity may benefit from being reindexed at the time of maintenance.  
</p>
+          <p>Dead-tuple bloat also occurs in indexes in PostgreSQL, so tables that
have had a lot of activity may benefit from being reindexed at the time of maintenance.  
</p>
           <p>We therefore recommend periodic, scheduled maintenance operations instead,
consisting of the following:</p>
           <p></p>
           <ul>
@@ -414,7 +414,7 @@ cd dist/example
            <li>REINDEX DATABASE &#60;the_db_name&#62;;</li>
           </ul>
           <p> </p>
-          <p>During maintenance, Postgresql locks tables one at a time.  Nevertheless,
the crawler ui may become unresponsive for some operations, such as when counting outstanding
documents on the job status page.  ManifoldCF thus has the ability to check for the existence
of a file prior to such sensitive operations, and will display a useful "maintenance in progress"
message if that file is found.  This allows a user to set up a maintenance system that provides
adequate feedback for an ManifoldCF user of the overall status of the system.</p>
+          <p>During maintenance, PostgreSQL locks tables one at a time.  Nevertheless,
the crawler ui may become unresponsive for some operations, such as when counting outstanding
documents on the job status page.  ManifoldCF thus has the ability to check for the existence
of a file prior to such sensitive operations, and will display a useful "maintenance in progress"
message if that file is found.  This allows a user to set up a maintenance system that provides
adequate feedback for an ManifoldCF user of the overall status of the system.</p>
           <p></p>
         </section>
         <section>
@@ -457,7 +457,7 @@ cd dist/example
             <tr><td>org.apache.manifoldcf.derbydatabasepath</td><td>No</td><td>Absolute
or relative path to Derby database; default is '.'.</td></tr>
             <tr><td>org.apache.manifoldcf.hsqldbdatabasepath</td><td>No</td><td>Absolute
or relative path to HSQLDB database; default is '.'.</td></tr>
             <tr><td>org.apache.manifoldcf.lockmanagerclass</td><td>No</td><td>Specifies
the class to use to implement synchronization.  Default is a built-in file-based synchronization
class.</td></tr>
-            <tr><td>org.apache.manifoldcf.databaseimplementationclass</td><td>No</td><td>Specifies
the class to use to implement database access.  Default is a built-in Postgresql implementation.
 Supported choices are: org.apache.manifoldcf.core.database.DBInterfaceDerby, org.apache.manifoldcf.core.database.DBInterfacePostgreSQL,
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB</td></tr>
+            <tr><td>org.apache.manifoldcf.databaseimplementationclass</td><td>No</td><td>Specifies
the class to use to implement database access.  Default is a built-in PostgreSQL implementation.
 Supported choices are: org.apache.manifoldcf.core.database.DBInterfaceDerby, org.apache.manifoldcf.core.database.DBInterfacePostgreSQL,
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB</td></tr>
             <tr><td>org.apache.manifoldcf.synchdirectory</td><td>Yes,
if file-based synchronization class is used</td><td>Specifies the path of a synchronization
directory.  All ManifoldCF process owners <strong>must</strong> have read/write
privileges to this directory.</td></tr>
             <tr><td>org.apache.manifoldcf.database.maxhandles</td><td>No</td><td>Specifies
the maximum number of database connection handles that will by pooled.  Recommended value
is 200.</td></tr>
             <tr><td>org.apache.manifoldcf.database.handletimeout</td><td>No</td><td>Specifies
the maximum time a handle is to live before it is presumed dead.  Recommend a value of 604800,
which is the maximum allowable.</td></tr>

Modified: incubator/lcf/trunk/site/src/documentation/content/xdocs/included-connectors.xml
URL: http://svn.apache.org/viewvc/incubator/lcf/trunk/site/src/documentation/content/xdocs/included-connectors.xml?rev=1210183&r1=1210182&r2=1210183&view=diff
==============================================================================
--- incubator/lcf/trunk/site/src/documentation/content/xdocs/included-connectors.xml (original)
+++ incubator/lcf/trunk/site/src/documentation/content/xdocs/included-connectors.xml Sun Dec
 4 19:16:09 2011
@@ -37,7 +37,7 @@
         <tr><td>CMIS</td><td>Pure Java</td><td>Various</td><td>CMIS
1.0</td><td>CMIS 1.0</td></tr>
         <tr><td>File System</td><td>Pure Java</td><td>Win/*NIX</td><td>N/A</td><td>N/A</td></tr>
         <tr><td>Windows Shares</td><td>Pure Java</td><td>
Win, Samba, NetApp, other NAS systems </td><td>N/A</td><td>N/A</td></tr>
-        <tr><td>JDBC</td><td> Pure Java </td><td> Various
</td><td> Supports JDBC V2, V3, V4; tested with Oracle 10, JTDS 1.2, Postgresql
9.1 drivers </td><td> Various </td></tr>
+        <tr><td>JDBC</td><td> Pure Java </td><td> Various
</td><td> Supports JDBC V2, V3, V4; tested with Oracle 10, JTDS 1.2, PostgreSQL
9.1 drivers </td><td> Various </td></tr>
         <tr><td>RSS</td><td> Pure Java </td><td> N/A
</td><td> N/A </td><td>Atom, RSS 2.0, others </td></tr>
         <tr><td>Web</td><td> Pure Java </td><td>N/A</td><td>
N/A </td><td>HTML Version 1.0, 1.1, 2.0, Atom, RSS 2.0, others </td></tr>
         <tr><td>Wiki</td><td> Pure Java </td><td>N/A</td><td>
N/A </td><td>Wiki version 1.8 and above </td></tr>



Mime
View raw message