lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Fields
Date Mon, 19 Feb 2007 16:05:08 GMT
See below.

On 2/19/07, Kainth, Sachin <Sachin.Kainth@atkinsglobal.com> wrote:
>
> Hi all,
>
> I have a few question regarding indexing documents.
>
> 1. With my experience of indexing documents with lucene so far I have
> done things like:
>
> Doc.Add(Field.Text("album", Album));
>
> Where Album is a string representing an album name.  Now with this sort
> of indexing what I do is a search such as:
>
> "album:Thriller"
>
> a) Does this mean that I cannot do an search across all fields  by
> submitting the query:
>
> "Thriller"?  In other words by submitting this query would my code
> search all fields?


No. If you just submit "Thriller", you'll only search the default field. See
QueryParser for the default field.


b) Is there a way in which I can index elements of a document without
> naming the field.  What would the impact of such a use of the indexing
> capabilities of Lucene be?


I don't think this makes sense in Lucene terms. All elements in a document
have a field. You can index everything into one field if you need an
aggregate, which gives you this same result.

Do note, however, that there's no requirement that all documents have the
same fields.


2. Is there a limit to the number of
> a) named fields per document that I can store


I think there is, but it's absurdly high. Don't worry about this....


b) non-named fields per document that I can store


0 since I don't think you can.


3.
>
> a) Is it possible in Lucene to conduct searches that are very complex
> such as:
>
> ((album = Thriller AND artist = (Michael OR Jackson)) OR (date between X
> AND Y)) AND (label = sony OR Epic)   etc...


Yes


b) For such a query what are the performance penalties compared to a
> simple search involving 1 term?


In the immortal words of Mr. Hatcher.. .it depends. You'll really just have
to experiment and find out. It can probably be approximated by taking the
sum of the individual queries as the upper limit. The real killer is
wildcards..... The real question isn't "what is the effect on performance",
it's "is the performance good enough for my application". Which varies as
the characteristics of the database change.

I would argue that a 1M index will process arbitrarily complex queries "fast
enough". The same may not be true for a 100G index. So this question is
really unanswerable in the abstract.


Cheers
>
> Sachin
>
>
>
> This email and any attached files are confidential and copyright
> protected. If you are not the addressee, any dissemination of this
> communication is strictly prohibited. Unless otherwise expressly agreed in
> writing, nothing stated in this communication shall be legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins
> plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove,
> Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you really
> need to.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message