manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Zapczynski" <Ian.Zapczyn...@veritablelp.com>
Subject Re: org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
Date Tue, 22 Mar 2016 19:38:32 GMT
Just to follow up, I clearly was doing something wrong when connecting to my HSQLDB instance,
because I couldn't actually see any user tables at all, so I could not drop anything from
repohistory.   Unless the database was indeed corrupted from the bloat as you suggested.
 
Once I found that moving to PostgreSQL would be much easier than I thought, I moved forward
with that and now all is well.    

>>> Karl Wright <daddywri@gmail.com> 3/18/2016 9:56 AM >>>
Hi Ian,

If you can can connect to your HSQLDB instance, you can simply drop all rows from the table
"repohistory". That should make a difference. Of course it is possible that the database instance
is corrupt now and nothing can be done to fix it up.

Once you get back to a point where queries will work against your HSQLDB instance, only then
will the configuration changes to control simple history table bloat work.

If you need to recreate everything, I do suggest you do it on Postgresql, since it's easier
to manage than HSQLDB and is meant for far larger database instances.

Thanks,
Karl


On Fri, Mar 18, 2016 at 9:44 AM, Ian Zapczynski <Ian.Zapczynski@veritablelp.com> wrote:


Karl,
Wow... 100 Mb vs. my 32+ Gb is certainly perplexing! 
I dropped HistoryCleanupInterval in properties.xml to 302400000 ms and have restarted and
waited, but I don't see a difference in .data file size. I tried to connect to HyperSQL directly
and run a CHECKPOINT DEFRAG and SHUTDOWN COMPACT, but I must not be doing these correctly
as the commands came back immediately with no effect whatsoever. 
Unless you think otherwise, I feel like I'm now only faced with a few options:
1) Delete the database and re-run the job to reindex all files. The problem will likely eventually
return.
2) Upgrade ManifoldCF to a recent release and see if the database magically shrinks. Is there
any logical hope in doing this?
3) Begin using PostgreSQL instead. This won't tell me what I'm apparently doing wrong, but
it will give me more flexibility with database maintenance. 

What do you think?
-Ian

>>> Karl Wright <daddywri@gmail.com> 3/16/2016 2:10 PM >>>
Hi Ian, 

This all looks very straightforward. Typical sizes of an HSQLDB database under this scenario
would probably run well under 100M. What might be happening, though, is that you might be
accumulating a huge history table. This would bloat your database until it falls over (which
for HSQLDB is at 32GB).

History records are used only for generation of reports. Normally MCF out of the box is configured
to drop history rows older than a month. But if you are doing lots of crawling and want to
stick with HSQLDB you might want to do it faster than that. There's a properties.xml parameter
you can set to control the time interval these records are kept; see the how-to-build-and-deploy
page.

Thanks,
Karl


On Wed, Mar 16, 2016 at 1:05 PM, Ian Zapczynski <Ian.Zapczynski@veritablelp.com> wrote:


Thanks, Karl.
I am using a single Windows shares repository connection to a folder on our file server which
currently contains a total of 143,997 files and 54,424 folders (59.2 Gb of total data) of
which ManifoldCF seems to identify just over 108,000 as indexable. The job specifies the following:

1. Include indexable file(s) matching * 
2. Include directory(s) matching * 

No custom connectors. I kept this simple because I'm a simple guy. :-) As such, it's entirely
possible that I did something stupid when I set it up, but I'm not seeing anything else obvious
that seems worth pointing out. 
-Ian

>>> Karl Wright <daddywri@gmail.com> 3/16/2016 12:03 PM >>>
Hi Ian, 

The database size seems way too big for this crawl size. I've not seen this problem before
but I suspect that whatever is causing the bloat is also causing HSQLDB to fail.

Can you give me further details about what repository connections you are using? It is possible
that there's a heretofore unknown pathological case you are running into during the crawl.
Are there any custom connectors involved?

If we rule out a bug of some kind, then the next thing to do would be to go to a real database,
e.g. PostgreSQL.

Karl


On Wed, Mar 16, 2016 at 11:04 AM, Ian Zapczynski <Ian.Zapczynski@veritablelp.com> wrote:


Hello,
We've had ManifoldCF 2.0.1 working well with SOLR for months on Windows 2012 using the single
process model. We recently just noticed that new documents are not getting ingested, even
after restarting the job, the server, etc. What I see in the logs are first a bunch of 500
errors coming out of SOLR as a result of ManifoldCF trying to index .tif files that are found
in the directory structure being indexed. After that (not sure if related or not), I see a
bunch of these errors:
FATAL 2016-03-15 16:01:48,801 (Thread-1387745) - C:\apache-manifoldcf-2.0.1\example\.\./dbname.data
getFromFile failed 33337202
org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
at org.hsqldb.error.Error.error(Unknown Source)
at org.hsqldb.persist.DataFileCache.getFromFile(Unknown Source)
at org.hsqldb.persist.DataFileCache.get(Unknown Source)
at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
at org.hsqldb.index.NodeAVLDisk.findNode(Unknown Source)
at org.hsqldb.index.NodeAVLDisk.getRight(Unknown Source)
at org.hsqldb.index.IndexAVL.next(Unknown Source)
at org.hsqldb.index.IndexAVL.next(Unknown Source)
at org.hsqldb.index.IndexAVL$IndexRowIterator.getNextRow(Unknown Source)
at org.hsqldb.RangeVariable$RangeIteratorMain.findNext(Unknown Source)
at org.hsqldb.RangeVariable$RangeIteratorMain.next(Unknown Source)
at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
at org.hsqldb.QuerySpecification.getResult(Unknown Source)
at org.hsqldb.StatementQuery.getResult(Unknown Source)
at org.hsqldb.StatementDMQL.execute(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
at org.hsqldb.Session.execute(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: java.lang.NegativeArraySizeException
at org.hsqldb.lib.StringConverter.readUTF(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readString(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readChar(Unknown Source)
at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinaryDecode.readData(Unknown Source)
at org.hsqldb.RowAVLDisk.<init>(Unknown Source)
at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
... 21 more
ERROR 2016-03-15 16:01:48,911 (Stuffer thread) - Stuffer thread aborting and restarting due
to database connection reset: Database exception: SQLException doing query (S1000): java.lang.NegativeArraySizeException
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException
doing query (S1000): java.lang.NegativeArraySizeException
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:771)
at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1444)
at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performQuery(DBInterfaceHSQLDB.java:916)
at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataChunk(IncrementalIngester.java:1783)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1748)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1703)
at org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:254)
Caused by: java.sql.SQLException: java.lang.NegativeArraySizeException
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
After these errors occur, the job just seems to hang and not process any further documents
or log anything more in the manifoldcf.log. So I see the error is coming out of the HyperSQL
database, but I don't know why. There is sufficient disk space. Now the database file is 33
Gb (larger than I'd expect for our ~110,000 documents), but I haven't seen any evidence that
we're hitting a limit on file size. I'm afraid I'm not sure where to go from here to further
nail down the problem.
As always, any and all help is much appreciated.
Thanks,


-Ian







Mime
View raw message