incubator-gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject HSQLDB woes...
Date Fri, 05 Nov 2010 15:31:01 GMT
Hi,

The HSQL-based SqlStore exhibits awful performance when used with Nutch.
I believe this is related to the way LOBs are handled in HSQL - even for
a tiny crawl of 50 pages the size of the .lob file is in the order of
100MB. Actually, after reaching this point the performance of any
updates drops dramatically so it becomes nearly unusable.

Of course, HSQL was never meant to be used as a serious backend...
still, perhaps there are alternatives that could give us a better
behavior for small / embedded use - and for small operations in the
order of a few thousand records I think we should be able to come up
with something better...

I tried to integrate the H2 database (www.h2database.com), but gave up
after I discovered that it doesn't support Blob.setBinaryStream(..) -
there are workarounds for this in H2, but it would complicate the code
too much...

Any suggestions / comments? Maybe it's time for a BerkeleyDB DataStore?

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message