incubator-gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney <doga...@gmail.com>
Subject Re: HSQLDB woes...
Date Mon, 08 Nov 2010 20:29:15 GMT
On Mon, Nov 8, 2010 at 10:26, Enis Söztutar <enis.soz@gmail.com> wrote:
> From my experience the SQL backend is a mJOR headache. Writing a SQL backend
> is actually much much harder than
> the HBase or Cassandra backend since we need very custom code for each SQL
> server. Plus, there is some code for
> dealing with HSQL embedded mode.
>
> I completely agree to switch to another zero-conf backend for tests and for
> nutch. However, I am not sure about BerkeleyDB.
> If we can implement a data store easily that would be great.
>

I looked a bit at BDB a while back. I think it should be easy to
implement... I'll look into
it this week and report back.

Does anyone know if there is a licensing problem with it?

> Enis
>
> On Fri, Nov 5, 2010 at 8:31 AM, Andrzej Bialecki <ab@getopt.org> wrote:
>
>> Hi,
>>
>> The HSQL-based SqlStore exhibits awful performance when used with Nutch.
>> I believe this is related to the way LOBs are handled in HSQL - even for
>> a tiny crawl of 50 pages the size of the .lob file is in the order of
>> 100MB. Actually, after reaching this point the performance of any
>> updates drops dramatically so it becomes nearly unusable.
>>
>> Of course, HSQL was never meant to be used as a serious backend...
>> still, perhaps there are alternatives that could give us a better
>> behavior for small / embedded use - and for small operations in the
>> order of a few thousand records I think we should be able to come up
>> with something better...
>>
>> I tried to integrate the H2 database (www.h2database.com), but gave up
>> after I discovered that it doesn't support Blob.setBinaryStream(..) -
>> there are workarounds for this in H2, but it would complicate the code
>> too much...
>>
>> Any suggestions / comments? Maybe it's time for a BerkeleyDB DataStore?
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>



-- 
Doğacan Güney

Mime
View raw message