lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Turcotte" <pat...@gmail.com>
Subject Re: 答复: 答复: Lucene in large database contexts
Date Thu, 05 Mar 2009 15:45:27 GMT
mkjjyy

On 8/10/07, Askar Zaidi <askar.zaidi@gmail.com> wrote:
> Hey Guys,
>
> I am trying to do something similar. Make the content search-able as  
> soon as
> it is added to the website. The way it can work in my scenario is  
> that , I
> create the Index for a every new user account created.
>
> Then, whenever a new document is uploaded, its contents are added to  
> the
> users Index using writer.addDocument(...)
>
> As  for closing the writer, yes ! I'll close the writer and optimize  
> after
> its added to the index.
>
> I really think this should work. Don't you ?
>
> thanks
> AZ
>
> On 8/10/07, Erick Erickson <erickerickson@gmail.com> wrote:
>>
>> Well, closing/opening an index is MUCH less expensive than
>> rebuilding the whole thing, so I don't understand part of your
>> statements....
>>
>> It *may* (but I haven't tried it) be possible to flush the writer  
>> rather
>> than
>> close/open it. But, you MUST close/reopen the reader you search with
>> even if flush works like I think it does.
>>
>> But it's also possible to use a two tiered approach. 1G isn't all  
>> that
>> big.
>> Could
>> you read it into a RAMDir and use that for your searches? Then,  
>> when you
>> add
>> data, you add it to *both* indexes, but close/open the RAMdir for
>> searching.
>>
>> It's also possible to keep the RAMdir as the delta between the  
>> FSdir and
>> "current" states of your index. Add to both and search both. Although
>> deletes may be a problem here.
>>
>> You haven't specified how often you expect changes, though. 100/ 
>> second?
>> 1/minute? How real is "real time"? You could do something like warm  
>> up
>> a new reader in the background whenever you decided you needed to be
>> absolutely up to date and swap your "live" reader for the newly  
>> warmed up
>> one whenever you deemed it wise.
>>
>> Or you could just close/open your reader after each modification,  
>> fire off
>> a
>>
>> couple of warmup queries at it and let the users live with slow  
>> responses
>> if they happen to search before your warm-up queries completed.
>>
>> The point is that there are many options, but to suggest the best  
>> one, we
>> need some throughput numbers and a better definition of what "real  
>> time"
>> means. Is a one minute delay acceptable? 10 seconds? a millisecond?
>> the answer defines the scope of reasonable solutions.....
>>
>> Best
>> Erick
>>
>> On 8/10/07, Antonello Provenzano <antonello@deveel.com> wrote:
>>>
>>> Kai,
>>>
>>> The context I'm going to work with requires a continuous addition of
>>> documents to the indexes, since it's user-driven content, and this
>>> would require the content to be always up-to-date.
>>> This is the problem I'm facing, since I cannot rebuild a 1Gb (at
>>> least) index every time a user inserts a new entry into the  
>>> database.
>>>
>>> I know Digg, for instance, is using Lucene as search engine: since  
>>> the
>>> amount of data they're dealing with is much higher than mine, I  
>>> would
>>> like to understand the way they used to implement this kind of
>>> solution.
>>>
>>> Thank you again.
>>> Antonello
>>>
>>>
>>> On 8/10/07, Kai Hu <kai.hu@dusee.cn> wrote:
>>>> Antonello,
>>>>        You are right,I think lucene indexsearcher will search the  
>>>> old
>>> information if IndexWriter was not closed(I think lucene release the
>> Lock
>>> here),so I only add a few documents every time from buffer to  
>>> implement
>>> index "real time".
>>>>
>>>> kai
>>>>
>>>>
>>>> 发件人: antonelloprov@gmail.com [mailto:antonelloprov@gmail.co 
>>>> m] 代表
>>> Antonello Provenzano
>>>> 发送时间: 2007年8月10日 星期五 17:59
>>>> 收件人: java-user@lucene.apache.org
>>>> 主题: Re: 答复: Lucene in large database contexts
>>>>
>>>> Kai,
>>>>
>>>> Thanks. The problem I see it's that although I can add a Document
>>>> through IndexWriter or IndexModifier, this won't be searchable  
>>>> until
>>>> the index is closed and, possibly, optimized, since the score of  
>>>> the
>>>> document in the index context must be re-calculated on the basis of
>>>> the whole context.
>>>>
>>>> Is this assumption true? or am I completely wrong?
>>>>
>>>> Cheers.
>>>> Antonello
>>>>
>>>>
>>>> On 8/10/07, Kai Hu <kai.hu@dusee.cn> wrote:
>>>>> Hi, Antonello
>>>>>        You can use IndexWriter.addDocument(Document document) to
>> add
>>> single document,same to update,delete operation.
>>>>>
>>>>> kai
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Antonello Provenzano [mailto:antonelloprov@gmail.com]
>>>>> 发送时间: 2007年8月10日 星期五 17:09
>>>>> 收件人: java-user@lucene.apache.org
>>>>> 主题: Lucene in large database contexts
>>>>>
>>>>> Hi There!
>>>>>
>>>>> I've been working for a while on the implementation of a website
>>>>> oriented to contents that would contain millions of entries,  
>>>>> most of
>>>>> them indexable (such as descriptions, texts, names, etc.).
>>>>> The ideal solution to make them searchable would be to use  
>>>>> Lucene as
>>>>> index and search engine.
>>>>>
>>>>> The reason I'm posting the mailing list is the following: since  
>>>>> all
>>>>> the entries will be stored in a database (most likely MySQL InnoDB
>> or
>>>>> Oracle), what's the best technique to implement a system that
>> indexes
>>>>> in "real time" (eg. when an entry is inserted into the databsse)  
>>>>> the
>>>>> content and make it searchable? Based on my understanding of  
>>>>> Lucene,
>>>>> such this thing is not possible, since the index must be re- 
>>>>> created
>> to
>>>>> be able to search the indexed contents. Is this true?
>>>>>
>>>>> Eventually, could anyone point me to a working example about how  
>>>>> to
>>>>> implement such a similar context?
>>>>>
>>>>>
>>>>> Thank you for the support.
>>>>> Antonello
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> --- 
>>>> ------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message