lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Can we use Berkley DB java in Solr
Date Thu, 04 Dec 2008 16:37:48 GMT
A database, just to store uncommitted documents in case they might be
updated, seems like it will have a pretty major impact on indexing
performance.  A lucene-only implementation would seem to be much
lighter on resources.

-Yonik

On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
<noble.paul@gmail.com> wrote:
> The solution will be an UpdateRequestProcessor (which itself is
> pluggable).I am implementing a JDBC based one. I'll test with H2 and
> MySql (and may be Derby)
>
> We will ship the H2 (embedded) jar
>
>
>
>
>
>
> On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <ryantxu@gmail.com> wrote:
>> Again, I would hope that solr builds a storage agnostic solution.
>>
>> As long as we have a simple interface to load/store documents, it should be
>> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>>
>> ryan
>>
>>
>> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ्
wrote:
>>
>>> Cassandra does not meet our requirements.
>>> we do not need that kind of scalability
>>>
>>> Moreover its future is uncertain and they are trying to incubate it into
>>> Solr
>>>
>>>
>>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ssiren@gmail.com> wrote:
>>>>
>>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>>
>>>> It at least claims to be scalable, no personal experience.
>>>>
>>>> --
>>>> Sami Siren
>>>>
>>>> Noble Paul ??????? ?????? wrote:
>>>>>
>>>>> Another persistence solution is ehcache with diskstore. It even has
>>>>> replication
>>>>>
>>>>> I have never used  ehcache . So I cannot comment on it
>>>>>
>>>>> any comments?
>>>>>
>>>>> --Noble
>>>>>
>>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>>> <noble.paul@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gsingers@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The code can be written against JDBC. But we need to test
the DDL and
>>>>>>>> data types on al the supported DBs
>>>>>>>>
>>>>>>>> But , which one would we like to ship with Solr as a default
option?
>>>>>>>>
>>>>>>>
>>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>>> to
>>>>>>> be
>>>>>>> on by default?  Or, do you mean just to have one for unit tests
to
>>>>>>> work?
>>>>>>>
>>>>>>
>>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>>> enabled I can have defaults for stuff like driver, url , DDL etc.
And
>>>>>> the user may not need to provide an extra jar
>>>>>>
>>>>>>>
>>>>>>> I don't know if it is still the case, but I often find embedded
dbs to
>>>>>>> be
>>>>>>> quite annoying since you often can't connect to them from other
>>>>>>> clients
>>>>>>> outside of the JVM which makes debugging harder.  Of course,
maybe I
>>>>>>> just
>>>>>>> don't know the tricks to do it.  Derby is one DB that you can
still
>>>>>>> connect
>>>>>>> to even when it is embedded.
>>>>>>>
>>>>>>
>>>>>> Embedded is the best bet for us because of performance reasons and
>>>>>> zero management.
>>>>>> The users can still read the data through Solr itself .
>>>>>>
>>>>>>>
>>>>>>> Also, whatever is chosen needs to scale to millions of documents,
and
>>>>>>> I
>>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>>> believing
>>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>>> machine,
>>>>>>> which is presumably what an embedded DB must do.  Presumably,
it also
>>>>>>> needs
>>>>>>> to be able to be replicated, right?
>>>>>>>
>>>>>>
>>>>>> millions of docs.?
>>>>>> then you must configure a remote DB for storage reasons
>>>>>> and must manage the replication separately
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the
memory
>>>>>>>> footprint is small too
>>>>>>>> --Noble
>>>>>>>>
>>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ryantxu@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> check http://www.h2database.com/  in my view the best
embedded DB
>>>>>>>>> out
>>>>>>>>> there.
>>>>>>>>>
>>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>>
>>>>>>>>> However, from anything solr, I would hope it would just
rely on
>>>>>>>>> JDBC.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr,
you might want to
>>>>>>>>>> go
>>>>>>>>>> beyond
>>>>>>>>>> that without a commit.
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>>> <dawid.weiss@cs.put.poznan.pl>wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges
a lot depending on
>>>>>>>>>>> the
>>>>>>>>>>> volume of data and queries, but otherwise the
license looks
>>>>>>>>>>> BSDish.
>>>>>>>>>>>
>>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>>
>>>>>>>>>>> Dawid
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --Noble Paul
>>>>>>>>
>>>>>>>
>>>>>>> --------------------------
>>>>>>> Grant Ingersoll
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>
>>
>
>
>
> --
> --Noble Paul
>
Mime
View raw message