lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Umesh Prasad <umesh.i...@gmail.com>
Subject Re: Using Lucene to search live, being-edited documents
Date Sat, 22 Jan 2011 07:24:40 GMT
Nopes. It won't be the case always. Users will not be always editing
the document. They will edit the document, then save which will be
persisted in db. You can use db triggers to push it into a indexing
queue, from which indexer can regularly pick up the document for
indexing. You can schedule your indexer so that it picks the indexing
jobs every minute or so.
    Unless you have a mission critical system, the approach should be
more than sufficient.



On Sat, Jan 22, 2011 at 10:58 AM, software visualization
<softwarevisualization@gmail.com> wrote:
> If I understand you correctly, I think that this :
>
> If T2 < T1, Skip the result.
>
> will always  be the case. The live being edited document is always "later"
> in time than the indexed information about it.
>
>
>
> On Fri, Jan 21, 2011 at 9:11 PM, Umesh Prasad <umesh.iitk@gmail.com> wrote:
>
>> Hi,
>>   One work around would be to version the documents and store the
>> version as well as the timestamp of indexed document into the index.
>>
>> Reading between lines I assume that
>> Document is
>> a) stored in some DB/File :
>> b) indexed in lucene index
>>
>> User Search  On on b)
>> Document ids
>> but documents are displayed to user after retrieving from a).
>>
>> Now I do not know a way in which I can keep a) and b) completely in
>> sync in realtime. As there will be some time taken in indexing
>> operation itself. a) --> b) .
>>
>> Instead we can do following.
>> a) stored : Document ID + Document Text + Document Version +
>> Modification Time Stamp (T1)
>> b) Indexed : Document ID + Document Text + Document Version +
>> Modification Time Stamp (T2) (when indexed) (broken into date + hour +
>> mins + sec for minimizing number of terms)
>>
>> User Searches b)
>> Search System gets Document ID + Modification Time Stamp (T2) and gives to
>> Presentation layer which compares the  T1 & T2.
>> If T2 < T1, Skip the result.
>>
>> Assumption : Stored document is always in sync. Documents are
>> persisted somewhere and not served from memory.
>>
>> Thanks & Regards
>> Umesh Prasad
>>
>>
>>
>> On Sat, Jan 22, 2011 at 1:29 AM, software visualization
>> <softwarevisualization@gmail.com> wrote:
>> > Hi sorry for the long delay.
>> >
>> > The idea is that a single user is editing a single document. As they
>> edit,
>> > any indexes built against the document become stale, actually wrong.
>> > Example:  references to specific localities within this document are all
>> > instantly wrong the first time a user types a new beginning  character-
>> > they're all off by one. Deleting  words is of course disastrous etc. etc.
>> >  So our story is- we used to have this document nicely indexed and now we
>> > have nothing useful.
>> >
>> > Considering what Lucene does prior to indexing, stemming for instance,  I
>> am
>> > not sure no, I am quite sure I can't  recreate the same powerful indexing
>> > functionality.
>> >
>> > But it seems wrong  to lure our users into opening this document with
>> > promises that this that and the other thing is has been located for them
>> > only to invalidate all that just because they began to edit the document.
>> I
>> > understand why that happens , but my users are perhaps not as tech savvy
>> and
>> > I think it will just feel "wrong" to them.
>> >
>> > So I am looking for a way around this.
>> >
>> >
>> >
>> > On Tue, Jan 4, 2011 at 1:25 PM, adasal <adam.saltiel@gmail.com> wrote:
>> >
>> >> I would think this is more like it.
>> >> But the essential thing, so it seems to me, is whether there is a
>> >> requirement for a serialised index, i.e. a more permanent record, aside
>> >> from
>> >> the saved document.
>> >> Then, if there is a penalty to creating the index compared to regex,
>> >> stringsearch or so, it is justified on other grounds.
>> >> I think it is an interesting q. when does that requirement emerge?
>> >> There is size of document.
>> >> But there would also be field types. I think I have this right. This is
>> >> really a classification system, so more than bare regex.
>> >> There must be other criteria that apply to this use case, too?
>> >>
>> >> Adam
>> >>
>> >> p.s. we (in my work project) are just beginning to use Lucene for
>> geometry
>> >> objects and I am looking forward to understanding its use better,
>> >> including,
>> >> possibly, expanding it to other use cases apart from geo objects.
>> >>
>> >> On 3 January 2011 15:31, Robert Muir <rcmuir@gmail.com> wrote:
>> >>
>> >> > On Mon, Jan 3, 2011 at 10:16 AM, Grant Ingersoll <gsingers@apache.org
>> >
>> >> > wrote:
>> >> > > There is also the MemoryIndex, which is in contrib and is designed
>> for
>> >> > one document at a time.  That being said, basic grep/regex is probably
>> >> fast
>> >> > enough.
>> >> > >
>> >> >
>> >> > In cases where you are doing a 'find' in a document similar to what
a
>> >> > wordprocessor would do (especially if you want to iterate
>> >> > forwards/backwards through matches etc), you might want to consider
>> >> > something like
>> >> >
>> http://icu-project.org/apiref/icu4j/com/ibm/icu/text/StringSearch.html
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> ---
>> Thanks & Regards
>> Umesh Prasad
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>



-- 
---
Thanks & Regards
Umesh Prasad

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message