lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshul jain <anshul.j...@epfl.ch>
Subject Lucene Performance issue
Date Wed, 21 Jan 2009 13:39:39 GMT
Hi,

I've indexed around half a million XML documents. Here is the document
sample:

 <a:attribute>

<a:name>cogito:Name</a:name>

<a:value>Alexander the Great</a:value>

</a:attribute>



<a:attribute>

<a:name>cogito:domain</a:name>

<a:value>ancient history</a:value>

</a:attribute>



<a:attribute>

<a:name>cogito:first_sentence</a:name>

<a:value>

Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323
BC), also known as Alexander III, was an ancient Greek king (basileus) of
Macedon (336-323 BC).

</a:value>

</a:attribute>


Average size of documents is around 4KB.

There are a few performance issues I need help with. When I index documents,
in a structured manner, using field information like:
name: alexander the great
domain: ancient history
first_sentence: Alexander the Great (Greek: or Megas Alexandros; July 20 356
BC June 10 323 BC), also known as Alexander III, was an ancient Greek king
(basileus) of Macedon (336-323 BC).
bagOfWords: alexander the great ancient history Alexander the Great (Greek:
or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander
III, was an ancient Greek king (basileus) of Macedon (336-323 BC).

bagOfWords is the field with all the text appended to it.

I get the index size of 4.5 GB, but if I just append the text and store in
one field
like:
value: alexander the great ancient history Alexander the Great (Greek: or
Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander
III, was an ancient Greek king (basileus) of Macedon (336-323 BC).

 the index size is only 700 MB.. why is this happening?



Also the query execution time of MultiFieldQueries is very slow, it is 20
times slower than single field query. Is it normal,  what could be the
reason for that?

Thanks,
Cheers,
Anshul

-- 
Anshul Jain

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message