lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Index: mixing the structure of persistence
Date Mon, 26 Nov 2007 14:39:50 GMT
Unfortunately, there's not much anyone can say. If I can paraphrase
what you're asking, it's "Will a Lucene search be fast enough?" The
answer is "it depends". Asking the question "is part of the index stored
in RAM" isn't really relevant.

Yes, some parts of the index are cached in RAM. Yes, that is handled
transparently under the covers. That information is entirely useless
for answering the question "is the search fast enough".

All you can do is create an index and test. This is especially true since
you've supplied no information that helps. 6M documents is the only hard
figure you've provided. Are you indexing 1 field per doc? 100 fields per
doc?
Do you expect to have to do lots of sorting? Complex queries? What
is acceptable performance? How fast is "fast"? How many requests do
you anticipate per second/minute/hour? And on and on.

But even with those pieces of information, we still can't help much because
there are just so many variables.

My suggestion:
Create a simple Lucene index of your documenta and measure. Read the
FAQ and other documentation on the Lucene website and on the Lucene
Wiki to get some pointers on how to structure your index for speed.
And experiment. Nothing else will give you a feel for the speed. Creating
the index should take you about a day. It'll be time well spent.

Best
Erick

On Nov 26, 2007 9:27 AM, Haroldo Nascimento <haroldo.araras@gmail.com>
wrote:

> Hi,
>
>  I have a very great volume of data (6.000.000 of documents) and I
> need to have a very fast search. I am thinking about using Terracotta
> (with Lucene) for clustering the solution.
>
>  One of the advantages of the Terracotta is that part of the index is
> stored in memory and part is persisted em disk. If not to find in
> memory the application searchs in disk. This is transparent for the
> user. The problem would be in the process of updat of index.
>
>  Another solution would be to persist the index using only Lucene,
> but I believe that the reply time very using disk either bigger that
> the solution in memory.
>
>  You know some document comparative about the solution em memory
> (RAMDirectory) and solution em disk using Lucene?
>
>  Tip: In another application I am using the solution index in memory
> (RAMDirectory), to initiate the process of load of the indice I
> serialized the RAMDirectory object. For it I need  insert "implements
> Serializable" in some classrooms of Lucene.
>
>
>
> On Nov 25, 2007 5:55 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> > As I understand, Lucene does a fair amount of caching of terms in
> > memory without you having to specify anything.
> >
> > But it's hard to see how your question relates. Remember that Lucene is
> > finding *all* matching docs. So searching in a RAMdirectory and then
> > searching in the file doesn't really seem possible since Lucene has to
> > search
> > the entire index every time to score the docs. It doesn't stop after
> > the first hit, since the next hit may score higher.
> >
> > But I'm sure Lucene *does* cache portions of the index in RAM when
> > possible, but I've never had occasion to dig into the details.
> >
> > Which leads me to ask "Why do you care?". Is there a specific
> > situation you're trying to get better performance from or is this more
> > of a background question? If you have a specific situation, please
> describe
> > it in some detail so better minds than mine can give you a better
> > response <G>....
> >
> > Best
> > Erick
> >
> > On Nov 24, 2007 10:26 AM, Haroldo Nascimento <haroldo.araras@gmail.com>
> > wrote:
> >
> >
> > > Hi,
> > >
> > >  I have a question ?
> > >
> > >  Lucene offers a mixing structure of storage of index, that is, first
> > > do search in memoria (ARMDirectory) and in case of not found do search
> > > in index file automatically ? For example: Load part of index in
> > > memory for do the search fastest.
> > >
> > >  Thnaks
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message