lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Using lucene as a database... good idea or bad idea?
Date Tue, 29 Jul 2008 13:56:13 GMT
Agreed, no one is saying should.  Additionally, Lucene can be faster  
for a number of things like storage when databases are overkill (i.e.  
you don't need transactions, complex joins, etc.)  After all, even the  
lookup of a file, can be viewed as a "search", even if it is just for  
a single unique key and doesn't require any fuzziness.


On Jul 29, 2008, at 9:21 AM, Ian Lea wrote:

> I don't think that anyone in this thread has said "should", just
> "could" - it is a valid option (IMHO).  Personally, I use it as a
> store for lucene related data because I know and like and trust it, it
> is already there for this project so no need to introduce another
> software dependency, and because it is blindingly fast.
>
>
> --
> Ian.
>
>
> On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್  
> ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
> <nageshblore@gmail.com> wrote:
>> The way I see it, search solutions (on whatever scale) have three  
>> components
>> - data aggregation, indexing/searching and presentation of results. I
>> thought, Lucene did the second part only.
>>
>> So, I do not quite follow, why should Lucene be used for datastore ?
>>
>> Nagesh
>>
>> On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll  
>> <gsingers@apache.org>wrote:
>>
>>> I think the answer is it can be done and probably quite well.  I  
>>> also think
>>> it's informative that Nutch does not use Lucene for this function,  
>>> as I
>>> understand it, but that shouldn't stop you either.  You might also  
>>> have a
>>> look at Apache Jackrabbit, which uses Lucene underneath as a content
>>> repository.
>>>
>>> -Grant
>>>
>>>
>>> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote:
>>>
>>> Hello all,
>>>>
>>>> I am also interested in this. I want to archive the content of the
>>>> document using Lucene.
>>>>
>>>> Is it a good idea to use Lucene as storage engine?
>>>>
>>>> Regards
>>>> Ganesh
>>>>
>>>> ----- Original Message ----- From: "Ian Lea" <ian.lea@gmail.com>
>>>> To: <java-user@lucene.apache.org>
>>>> Sent: Tuesday, July 29, 2008 2:18 PM
>>>> Subject: Re: Using lucene as a database... good idea or bad idea?
>>>>
>>>>
>>>> John
>>>>>
>>>>>
>>>>> I think it's a great idea, and do exactly this to store 5 million+
>>>>> documents with info that it takes way too long to get out of our
>>>>> Oracle database (think days).  Not as many docs as you are talking
>>>>> about, and less data for each doc, but I wouldn't have any  
>>>>> concerns
>>>>> about scaling.  There are certainly lucene indexes out there  
>>>>> bigger
>>>>> than what you propose.  You can compress the stored data to save  
>>>>> some
>>>>> space.  Run times for optimization might get interesting but see
>>>>> recent threads for suggestions on that.  And since you are not too
>>>>> concerned about performance you may not need to optimize much,  
>>>>> or even
>>>>> at all.
>>>>>
>>>>> Of course you need to remember that this is not a DBMS solution  
>>>>> in the
>>>>> sense of transactions, recovery, etc. but I'm sure you are already
>>>>> aware of that.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <john@jpevans.com>
 
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have successfully used Lucene in the "tradtiional" way to  
>>>>>> provide
>>>>>> full-text search for various websites.  Now I am tasked with  
>>>>>> developing
>>>>>> a
>>>>>> data-store to back a web crawler.  The crawler can be  
>>>>>> configured to
>>>>>> retrieve
>>>>>> arbitrary fields from arbitrary pages, so the result is that each
>>>>>> document
>>>>>> may have a random assortment of fields.  It seems like Lucene  
>>>>>> may be a
>>>>>> natural fit for this scenario since you can obviously add  
>>>>>> arbitrary
>>>>>> fields
>>>>>> to each document and you can store the actually data in the  
>>>>>> database.
>>>>>> I've
>>>>>> done some research to make sure that it would meet all of our  
>>>>>> individual
>>>>>> requirements (that we can iterate over documents, update
>>>>>> (delete/replace)
>>>>>> documents, etc.) and everything looks good.  I've also seen a  
>>>>>> couple of
>>>>>> references around the net to other people trying similar  
>>>>>> things...
>>>>>> however,
>>>>>> I know it's not meant to be used this way, so I thought I would 

>>>>>> post
>>>>>> here
>>>>>> and ask for guidance?  Has anyone done something similar?  Is  
>>>>>> there any
>>>>>> specific reason to think this is a bad idea?
>>>>>>
>>>>>> The one thing that I am least certain about his how well it  
>>>>>> will scale.
>>>>>> We
>>>>>> may reach the point where we have tens of millions of documents 

>>>>>> and a
>>>>>> high
>>>>>> percentage of those documents may be relatively large (10k-50k  
>>>>>> each).
>>>>>> We
>>>>>> actually would NOT be expecting/needing Lucene's normal extreme 

>>>>>> fast
>>>>>> text
>>>>>> search times for this, but we would need reasonable times for  
>>>>>> adding new
>>>>>> documents to the index, retrieving documents by ID (for  
>>>>>> iterating over
>>>>>> all
>>>>>> documents), optimizing the index after a series of changes, etc.
>>>>>>
>>>>>> Any advice/input/theories anyone can contribute would be greatly
>>>>>> appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> -
>>>>>> John
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> Send instant messages to your online friends
>>>> http://in.messenger.yahoo.com
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message