jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muguet Bradbury" <M.Bradb...@fortent.com>
Subject RE: Memory issues with jackrabbit/lucene
Date Tue, 29 Sep 2009 12:39:55 GMT
Sebastien,

Just as a reminder, we use jackrabbit 1.4.  I'm not explicitly using the text-extractors.
 Our repository.xml looks like this:

<Repository>
	<FileSystem class="org.apache.jackrabbit.core.fs.db.JNDIDatabaseFileSystem">
		<param name="dataSourceLocation" value="kycAppDataSource" />
		<param name="schema" value="mssql" />
		<param name="schemaObjectPrefix" value="J_R_FS_" />
		<param name="bundleCacheSize" value="8" />
		<param name="consistencyCheck" value="false" />
		<param name="minBlobSize" value="16384" />
	</FileSystem>

	<Security appName="Jackrabbit">
		<AccessManager
			class="org.apache.jackrabbit.core.security.SimpleAccessManager">
		</AccessManager>

		<LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
			<param name="anonymousId" value="anonymous" />
		</LoginModule>
	</Security>

	<Workspaces rootPath="${rep.home}/workspaces"
		defaultWorkspace="default" />

	<Workspace name="${wsp.name}">
		<FileSystem class="org.apache.jackrabbit.core.fs.db.JNDIDatabaseFileSystem">
			<param name="dataSourceLocation" value="kycAppDataSource" />
			<param name="schema" value="mssql" />
			<param name="schemaObjectPrefix" value="J_FS_${wsp.name}_" />
			<param name="bundleCacheSize" value="8" />
			<param name="consistencyCheck" value="false" />
			<param name="minBlobSize" value="16384" />
		</FileSystem>
		<PersistenceManager
			class="org.apache.jackrabbit.core.persistence.db.JNDIDatabasePersistenceManager">
			<param name="dataSourceLocation" value="kycAppDataSource" />
			<param name="schema" value="mssql" />
			<param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
			<param name="bundleCacheSize" value="8" />
			<param name="consistencyCheck" value="false" />
			<param name="minBlobSize" value="16384" />
		</PersistenceManager>
		<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
			<param name="path" value="${wsp.home}/index" />
		</SearchIndex>
	</Workspace>

	<Versioning rootPath="${rep.home}/version">
		<FileSystem class="org.apache.jackrabbit.core.fs.db.JNDIDatabaseFileSystem">
			<param name="dataSourceLocation" value="kycAppDataSource" />
			<param name="schema" value="mssql" />
			<param name="schemaObjectPrefix" value="J_V_FS_" />
			<param name="bundleCacheSize" value="8" />
			<param name="consistencyCheck" value="false" />
			<param name="minBlobSize" value="16384" />
		</FileSystem>
		<PersistenceManager
			class="org.apache.jackrabbit.core.persistence.db.JNDIDatabasePersistenceManager">
			<param name="dataSourceLocation" value="kycAppDataSource" />
			<param name="schema" value="mssql" />
			<param name="schemaObjectPrefix" value="J_V_PM_" />
			<param name="bundleCacheSize" value="8" />
			<param name="consistencyCheck" value="false" />
			<param name="minBlobSize" value="16384" />
		</PersistenceManager>
	</Versioning>

	<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
		<param name="path" value="${rep.home}/repository/index" />
	</SearchIndex>
</Repository>

Our customers are alleviating the memory problem by restarting the servers daily.

The documents we store are numerous (thousands daily) and vary in size.  They are news articles
(xml/html) and reports (rtf) and are all stored as binary content (base64 encoded).  We also
store some attributes about these articles that are in string format.  We delete thousands
of news articles per day when reports are finalized.  We do not need to be able to search
the content of these articles - but I assume they are being indexed because we have specified
SearchIndex elements in our repository xml.


Am I correct here?
Muguet


-----Original Message-----
From: Sébastien Launay [mailto:sebastien.launay@anyware-tech.com] 
Sent: Tuesday, September 29, 2009 8:20 AM
To: users@jackrabbit.apache.org
Subject: Re: Memory issues with jackrabbit/lucene

Le 29/09/2009 13:51, Muguet Bradbury a écrit :
> Sebastien,
>
> Thanks for the reply.  Yes, we do store large documents (rtf and large xml documents).
 When we store each document, we create a session, add the document, save the session, and
close the session.  The LuceneTermBuffers remain.  However, if the indexing occurs asynchronously,
this may be what's filling up the memory.  Eventually, the application gets an out of memory
exception.
This is clearly caused by the asynchronous indexing of binary properties.
You can also deactivate index of this kind of documents [1].

Can you provide more informations on these documents (size, number, ...) ?

> I will look into removing the SearchIndex elements from the repository.xml and workspace.xml.
 Do we also need to remove the index directories from the wsp.home path?  Will removing the
SearchIndex elements make retrieval of the documents (with the node keys) slower?
>   
Removing the index directory is not mandatory as it will not be used
anymore. But, this consumes disk space so you can remove them.

Lucene indexes are only used for search features (XPath, SQL, AQM).
Node#getNodes(),  Node#getProperties(), Session#getNodeByUUID(),
... uses an asbtraction called PersistenceManager [2].
Default implementations of PersistenceManager do not use an index.

[1] http://jackrabbit.apache.org/jackrabbit-text-extractors.html
[2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

--
Sébastien Launay


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
______________________________________________________________________

Mime
View raw message