lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshul jain <anshulnirv...@gmail.com>
Subject Re: Lucene Performance issue
Date Wed, 21 Jan 2009 16:10:04 GMT
Hi,

thanks for the reply.

For the document, in my last mail..
multifieldQuery:
name: Alexander AND domain: history AND first_sentence: Macedon

Single field query:
bagOfWords: Alexander history Macedon

I can for sure say that multiple copies are not index. But the number of
fields in which text is divided are many. Can that be a reason?

How is field data stored in Index and searched? I read the document on file
formats on lucene's web site but it was not very clear.

Cheers,
Anshul

On Wed, Jan 21, 2009 at 4:42 PM, Ian Lea <ian.lea@gmail.com> wrote:

> Hi
>
>
> Space: 700Mb vs 4.5Gb sounds way too big a difference.  Are you sure
> you aren't loading multiple copies of the data or something like that?
>
> Queries: a 20 times slowdown for a multi field query also sounds way
> too big.  What do the simple and multi field queries look like?
>
>
>
> --
> Ian.
>
>
> On Wed, Jan 21, 2009 at 1:39 PM, Anshul jain <anshul.jain@epfl.ch> wrote:
> > Hi,
> >
> > I've indexed around half a million XML documents. Here is the document
> > sample:
> >
> >  <a:attribute>
> >
> > <a:name>cogito:Name</a:name>
> >
> > <a:value>Alexander the Great</a:value>
> >
> > </a:attribute>
> >
> >
> >
> > <a:attribute>
> >
> > <a:name>cogito:domain</a:name>
> >
> > <a:value>ancient history</a:value>
> >
> > </a:attribute>
> >
> >
> >
> > <a:attribute>
> >
> > <a:name>cogito:first_sentence</a:name>
> >
> > <a:value>
> >
> > Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10
> 323
> > BC), also known as Alexander III, was an ancient Greek king (basileus) of
> > Macedon (336-323 BC).
> >
> > </a:value>
> >
> > </a:attribute>
> >
> >
> > Average size of documents is around 4KB.
> >
> > There are a few performance issues I need help with. When I index
> documents,
> > in a structured manner, using field information like:
> > name: alexander the great
> > domain: ancient history
> > first_sentence: Alexander the Great (Greek: or Megas Alexandros; July 20
> 356
> > BC June 10 323 BC), also known as Alexander III, was an ancient Greek
> king
> > (basileus) of Macedon (336-323 BC).
> > bagOfWords: alexander the great ancient history Alexander the Great
> (Greek:
> > or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as
> Alexander
> > III, was an ancient Greek king (basileus) of Macedon (336-323 BC).
> >
> > bagOfWords is the field with all the text appended to it.
> >
> > I get the index size of 4.5 GB, but if I just append the text and store
> in
> > one field
> > like:
> > value: alexander the great ancient history Alexander the Great (Greek: or
> > Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander
> > III, was an ancient Greek king (basileus) of Macedon (336-323 BC).
> >
> >  the index size is only 700 MB.. why is this happening?
> >
> >
> >
> > Also the query execution time of MultiFieldQueries is very slow, it is 20
> > times slower than single field query. Is it normal,  what could be the
> > reason for that?
> >
> > Thanks,
> > Cheers,
> > Anshul
> >
> > --
> > Anshul Jain
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Anshul Jain

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message