lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Lucene Performance issue
Date Wed, 21 Jan 2009 16:14:17 GMT
Note that your two queries are different unless you've
changed the default operator.

Also, your bagOfWords query is searching across your
default field for the second two terms.

Your bagOfWords is really something like

bagOfWords:Alexander OR <default field>:history OR <default field>:Macedon.

Best
Erick

On Wed, Jan 21, 2009 at 11:10 AM, Anshul jain <anshulnirvana@gmail.com>wrote:

> Hi,
>
> thanks for the reply.
>
> For the document, in my last mail..
> multifieldQuery:
> name: Alexander AND domain: history AND first_sentence: Macedon
>
> Single field query:
> bagOfWords: Alexander history Macedon
>
> I can for sure say that multiple copies are not index. But the number of
> fields in which text is divided are many. Can that be a reason?
>
> How is field data stored in Index and searched? I read the document on file
> formats on lucene's web site but it was not very clear.
>
> Cheers,
> Anshul
>
> On Wed, Jan 21, 2009 at 4:42 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
> > Hi
> >
> >
> > Space: 700Mb vs 4.5Gb sounds way too big a difference.  Are you sure
> > you aren't loading multiple copies of the data or something like that?
> >
> > Queries: a 20 times slowdown for a multi field query also sounds way
> > too big.  What do the simple and multi field queries look like?
> >
> >
> >
> > --
> > Ian.
> >
> >
> > On Wed, Jan 21, 2009 at 1:39 PM, Anshul jain <anshul.jain@epfl.ch>
> wrote:
> > > Hi,
> > >
> > > I've indexed around half a million XML documents. Here is the document
> > > sample:
> > >
> > >  <a:attribute>
> > >
> > > <a:name>cogito:Name</a:name>
> > >
> > > <a:value>Alexander the Great</a:value>
> > >
> > > </a:attribute>
> > >
> > >
> > >
> > > <a:attribute>
> > >
> > > <a:name>cogito:domain</a:name>
> > >
> > > <a:value>ancient history</a:value>
> > >
> > > </a:attribute>
> > >
> > >
> > >
> > > <a:attribute>
> > >
> > > <a:name>cogito:first_sentence</a:name>
> > >
> > > <a:value>
> > >
> > > Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10
> > 323
> > > BC), also known as Alexander III, was an ancient Greek king (basileus)
> of
> > > Macedon (336-323 BC).
> > >
> > > </a:value>
> > >
> > > </a:attribute>
> > >
> > >
> > > Average size of documents is around 4KB.
> > >
> > > There are a few performance issues I need help with. When I index
> > documents,
> > > in a structured manner, using field information like:
> > > name: alexander the great
> > > domain: ancient history
> > > first_sentence: Alexander the Great (Greek: or Megas Alexandros; July
> 20
> > 356
> > > BC June 10 323 BC), also known as Alexander III, was an ancient Greek
> > king
> > > (basileus) of Macedon (336-323 BC).
> > > bagOfWords: alexander the great ancient history Alexander the Great
> > (Greek:
> > > or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as
> > Alexander
> > > III, was an ancient Greek king (basileus) of Macedon (336-323 BC).
> > >
> > > bagOfWords is the field with all the text appended to it.
> > >
> > > I get the index size of 4.5 GB, but if I just append the text and store
> > in
> > > one field
> > > like:
> > > value: alexander the great ancient history Alexander the Great (Greek:
> or
> > > Megas Alexandros; July 20 356 BC June 10 323 BC), also known as
> Alexander
> > > III, was an ancient Greek king (basileus) of Macedon (336-323 BC).
> > >
> > >  the index size is only 700 MB.. why is this happening?
> > >
> > >
> > >
> > > Also the query execution time of MultiFieldQueries is very slow, it is
> 20
> > > times slower than single field query. Is it normal,  what could be the
> > > reason for that?
> > >
> > > Thanks,
> > > Cheers,
> > > Anshul
> > >
> > > --
> > > Anshul Jain
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Anshul Jain
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message