lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From software visualization <softwarevisualizat...@gmail.com>
Subject Re: Using Lucene to search live, being-edited documents
Date Sat, 22 Jan 2011 05:28:38 GMT
If I understand you correctly, I think that this :

If T2 < T1, Skip the result.

will always  be the case. The live being edited document is always "later"
in time than the indexed information about it.



On Fri, Jan 21, 2011 at 9:11 PM, Umesh Prasad <umesh.iitk@gmail.com> wrote:

> Hi,
>   One work around would be to version the documents and store the
> version as well as the timestamp of indexed document into the index.
>
> Reading between lines I assume that
> Document is
> a) stored in some DB/File :
> b) indexed in lucene index
>
> User Search  On on b)
> Document ids
> but documents are displayed to user after retrieving from a).
>
> Now I do not know a way in which I can keep a) and b) completely in
> sync in realtime. As there will be some time taken in indexing
> operation itself. a) --> b) .
>
> Instead we can do following.
> a) stored : Document ID + Document Text + Document Version +
> Modification Time Stamp (T1)
> b) Indexed : Document ID + Document Text + Document Version +
> Modification Time Stamp (T2) (when indexed) (broken into date + hour +
> mins + sec for minimizing number of terms)
>
> User Searches b)
> Search System gets Document ID + Modification Time Stamp (T2) and gives to
> Presentation layer which compares the  T1 & T2.
> If T2 < T1, Skip the result.
>
> Assumption : Stored document is always in sync. Documents are
> persisted somewhere and not served from memory.
>
> Thanks & Regards
> Umesh Prasad
>
>
>
> On Sat, Jan 22, 2011 at 1:29 AM, software visualization
> <softwarevisualization@gmail.com> wrote:
> > Hi sorry for the long delay.
> >
> > The idea is that a single user is editing a single document. As they
> edit,
> > any indexes built against the document become stale, actually wrong.
> > Example:  references to specific localities within this document are all
> > instantly wrong the first time a user types a new beginning  character-
> > they're all off by one. Deleting  words is of course disastrous etc. etc.
> >  So our story is- we used to have this document nicely indexed and now we
> > have nothing useful.
> >
> > Considering what Lucene does prior to indexing, stemming for instance,  I
> am
> > not sure no, I am quite sure I can't  recreate the same powerful indexing
> > functionality.
> >
> > But it seems wrong  to lure our users into opening this document with
> > promises that this that and the other thing is has been located for them
> > only to invalidate all that just because they began to edit the document.
> I
> > understand why that happens , but my users are perhaps not as tech savvy
> and
> > I think it will just feel "wrong" to them.
> >
> > So I am looking for a way around this.
> >
> >
> >
> > On Tue, Jan 4, 2011 at 1:25 PM, adasal <adam.saltiel@gmail.com> wrote:
> >
> >> I would think this is more like it.
> >> But the essential thing, so it seems to me, is whether there is a
> >> requirement for a serialised index, i.e. a more permanent record, aside
> >> from
> >> the saved document.
> >> Then, if there is a penalty to creating the index compared to regex,
> >> stringsearch or so, it is justified on other grounds.
> >> I think it is an interesting q. when does that requirement emerge?
> >> There is size of document.
> >> But there would also be field types. I think I have this right. This is
> >> really a classification system, so more than bare regex.
> >> There must be other criteria that apply to this use case, too?
> >>
> >> Adam
> >>
> >> p.s. we (in my work project) are just beginning to use Lucene for
> geometry
> >> objects and I am looking forward to understanding its use better,
> >> including,
> >> possibly, expanding it to other use cases apart from geo objects.
> >>
> >> On 3 January 2011 15:31, Robert Muir <rcmuir@gmail.com> wrote:
> >>
> >> > On Mon, Jan 3, 2011 at 10:16 AM, Grant Ingersoll <gsingers@apache.org
> >
> >> > wrote:
> >> > > There is also the MemoryIndex, which is in contrib and is designed
> for
> >> > one document at a time.  That being said, basic grep/regex is probably
> >> fast
> >> > enough.
> >> > >
> >> >
> >> > In cases where you are doing a 'find' in a document similar to what a
> >> > wordprocessor would do (especially if you want to iterate
> >> > forwards/backwards through matches etc), you might want to consider
> >> > something like
> >> >
> http://icu-project.org/apiref/icu4j/com/ibm/icu/text/StringSearch.html
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >>
> >
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message