lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandeep Mahendru" <mahendru.sand...@gmail.com>
Subject Re: How do we limit the growth of a Lucene Index?
Date Tue, 06 Nov 2007 02:35:33 GMT
Hi Marcus,

  Thanks for providing these suggestions. I will work on these directions
with my team.

Regards,
Sandeep.


On 11/5/07, Marcus Herou <marcus.herou@tailsweep.com> wrote:
>
> As you suggest you could either roll the index on the local machine or
> remote and gzip the content fileds on the archive index and provide a
> GzipReader when you need to search old results.
>
> If money is of the essence then the best solution probably is to have 1
> good
> box with fast SCSI disks which serves the primary index and an "archive"
> slower machine with huge cheap IDE disks.
> You then need to have a servlet or such on the archive server which
> understands searching over http. Anyone said SOLR :)
>
> //Marcus
>
> On 11/5/07, Grant Ingersoll <gsingers@apache.org> wrote:
> >
> > You could search this list about distributing your indexes, etc.
> > RemoteSearchable may be handy, but you will probably have to build
> > some infrastructure around it for handling failover, etc. (would make
> > for a nice contribution)
> >
> > How often do you think archived data will need to be accessed?  And
> > how much data are you talking?  Seems to me like the main issue will
> > be in managing the searchers in light of having a lot of potential
> > indexes.  Just thinking out loud, though.
> >
> > -Grant
> >
> > On Nov 4, 2007, at 1:48 PM, Sandeep Mahendru wrote:
> >
> > > Hi ,
> > >
> > >   We have been developing an enterprise logging service at the
> > > Wachovia
> > > bank. The logs (Busines, application, error) for all the bank related
> > > applications are consolidated
> > > at one single location in an Oracle 10g Database.
> > >
> > > In our second phase, we are now building a high perforinmg report
> > > viewer
> > > over it. So our search algorithm does not go to the Oracle 10g DB. We
> > > therfore avoid network and I/O.
> > > Our serach algorith now goes to a LUCENE index. We have Lucene indexes
> > > created for each application. These indexes are present on the same
> > > machine,
> > > where the search algorithm runs. As more applications at the bank
> > > are now
> > > beginning to consume this service, the Lucene Index is now growing.
> > >
> > > One of my team leads has suggested the following approach to resolve
> > > this
> > > issue:
> > >
> > > *I think the best approach is to restrict the Index size , is to
> > > keep it for
> > > some limited time and then archive the same. In case user wants to
> > > search
> > > against the old files then we might need to provide some
> > > configuration using
> > > which the lucene searcher can point to the achieved file and search
> > > the
> > > content. To implement this we need to rename the Index file with
> > > from and to
> > > date before its archived. While searching against the older files,
> > > user need
> > > to provide the date range and then the app can point to the relevant
> > > archived index files for search. Let me know your thoughts on this. *
> > > **
> > > At present this sounds the most logical to me. But then we begin to
> > > store
> > > the Lucene indexes on a diffferent machine. This might again cause the
> > > search algorithm to make a network trip, if the serach is based on old
> > > archived data.
> > >
> > > Is there a better design to resolve the above concern. Does Lucene
> > > provid
> > > some sort of API to handle the above scenario's?
> > >
> > > Regards,
> > > Sandeep.
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.grantingersoll.com
> >
> > Lucene Boot Camp Training:
> > ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!
> http://www.apachecon.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Marcus Herou Solution Architect & Core Java developer Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message