lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Using Lucene to search live, being-edited documents
Date Sat, 22 Jan 2011 07:21:58 GMT
There's a feature in lucene called an "instantiated" index. This has
all of the Lucene data structures directly as objects instead of
serialized to disk or a RAMDirectory. It never needs to be committed:
you index a document and it is immediately searchable. It is larger
and faster than a normal index, but might be the right thing for this
use case. You cannot store it to disk, it only lives in memory.

On Fri, Jan 21, 2011 at 9:28 PM, software visualization
<softwarevisualization@gmail.com> wrote:
> If I understand you correctly, I think that this :
>
> If T2 < T1, Skip the result.
>
> will always  be the case. The live being edited document is always "later"
> in time than the indexed information about it.
>
>
>
> On Fri, Jan 21, 2011 at 9:11 PM, Umesh Prasad <umesh.iitk@gmail.com> wrote:
>
>> Hi,
>>   One work around would be to version the documents and store the
>> version as well as the timestamp of indexed document into the index.
>>
>> Reading between lines I assume that
>> Document is
>> a) stored in some DB/File :
>> b) indexed in lucene index
>>
>> User Search  On on b)
>> Document ids
>> but documents are displayed to user after retrieving from a).
>>
>> Now I do not know a way in which I can keep a) and b) completely in
>> sync in realtime. As there will be some time taken in indexing
>> operation itself. a) --> b) .
>>
>> Instead we can do following.
>> a) stored : Document ID + Document Text + Document Version +
>> Modification Time Stamp (T1)
>> b) Indexed : Document ID + Document Text + Document Version +
>> Modification Time Stamp (T2) (when indexed) (broken into date + hour +
>> mins + sec for minimizing number of terms)
>>
>> User Searches b)
>> Search System gets Document ID + Modification Time Stamp (T2) and gives to
>> Presentation layer which compares the  T1 & T2.
>> If T2 < T1, Skip the result.
>>
>> Assumption : Stored document is always in sync. Documents are
>> persisted somewhere and not served from memory.
>>
>> Thanks & Regards
>> Umesh Prasad
>>
>>
>>
>> On Sat, Jan 22, 2011 at 1:29 AM, software visualization
>> <softwarevisualization@gmail.com> wrote:
>> > Hi sorry for the long delay.
>> >
>> > The idea is that a single user is editing a single document. As they
>> edit,
>> > any indexes built against the document become stale, actually wrong.
>> > Example:  references to specific localities within this document are all
>> > instantly wrong the first time a user types a new beginning  character-
>> > they're all off by one. Deleting  words is of course disastrous etc. etc.
>> >  So our story is- we used to have this document nicely indexed and now we
>> > have nothing useful.
>> >
>> > Considering what Lucene does prior to indexing, stemming for instance,  I
>> am
>> > not sure no, I am quite sure I can't  recreate the same powerful indexing
>> > functionality.
>> >
>> > But it seems wrong  to lure our users into opening this document with
>> > promises that this that and the other thing is has been located for them
>> > only to invalidate all that just because they began to edit the document.
>> I
>> > understand why that happens , but my users are perhaps not as tech savvy
>> and
>> > I think it will just feel "wrong" to them.
>> >
>> > So I am looking for a way around this.
>> >
>> >
>> >
>> > On Tue, Jan 4, 2011 at 1:25 PM, adasal <adam.saltiel@gmail.com> wrote:
>> >
>> >> I would think this is more like it.
>> >> But the essential thing, so it seems to me, is whether there is a
>> >> requirement for a serialised index, i.e. a more permanent record, aside
>> >> from
>> >> the saved document.
>> >> Then, if there is a penalty to creating the index compared to regex,
>> >> stringsearch or so, it is justified on other grounds.
>> >> I think it is an interesting q. when does that requirement emerge?
>> >> There is size of document.
>> >> But there would also be field types. I think I have this right. This is
>> >> really a classification system, so more than bare regex.
>> >> There must be other criteria that apply to this use case, too?
>> >>
>> >> Adam
>> >>
>> >> p.s. we (in my work project) are just beginning to use Lucene for
>> geometry
>> >> objects and I am looking forward to understanding its use better,
>> >> including,
>> >> possibly, expanding it to other use cases apart from geo objects.
>> >>
>> >> On 3 January 2011 15:31, Robert Muir <rcmuir@gmail.com> wrote:
>> >>
>> >> > On Mon, Jan 3, 2011 at 10:16 AM, Grant Ingersoll <gsingers@apache.org
>> >
>> >> > wrote:
>> >> > > There is also the MemoryIndex, which is in contrib and is designed
>> for
>> >> > one document at a time.  That being said, basic grep/regex is probably
>> >> fast
>> >> > enough.
>> >> > >
>> >> >
>> >> > In cases where you are doing a 'find' in a document similar to what
a
>> >> > wordprocessor would do (especially if you want to iterate
>> >> > forwards/backwards through matches etc), you might want to consider
>> >> > something like
>> >> >
>> http://icu-project.org/apiref/icu4j/com/ibm/icu/text/StringSearch.html
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> ---
>> Thanks & Regards
>> Umesh Prasad
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message