lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mariella Di Giacomo <marie...@lanl.gov>
Subject Question about proximity searching and wildcards
Date Tue, 18 Jan 2005 22:01:48 GMT
Hello,

We are using Lucene to index scientific articles.
We are also using Luke to verify the fields and values we index.

One of the fields we index is the author field that consists of the authors 
that have written the scientific article (an example of such data is shown 
at the bottom of the email).

The most common search on the author field is the following:

"find all the authors whose last name starts with Cole and the first name 
starts with S"

We thought of a proximity search (we want to make sure we take the first 
name and not the middle name/initial) similar like that

"Author:cole* S*"~1
"Author:cole* AND Author:S*"~1

What we were expecting was: all the documents that contain authors whose 
last name starts with Cole and the first name start with S and those words 
are near (next to each other)

Unfortunately when we type that search through the Luke "search interface" 
we do not get the expansion of the
words when using the similarity at the same time

So my question:

1) Does Luke cannot deal with that ?
2) Is the query not properly structured to get what expected ? Which would 
be the correct one ?

If Luke cannot deal with that, when writing the query through the Java 
application, which would be the
query to be provided to get what expected ?
Do we need to use a query filter ?


Thanks a lot in advance for your help,


Mariella






_____________________________________________________________________________________________
E.g.

Below there are three examples of data we index and to be precise the 
information related to the Authors field.

The following is the information related to scientific articles that we index.

1)
The Authors field consists of two authors

Title: Using Document Dimensions for Enhanced Information Retrieval
Authors: Jayasooriya, Thimala(thimal@cs.york.ac.uk); Manandhar, 
Suresha(suresh@cs.york.ac.uk)
Affiliations: a. Department of Computer Science, University of York
Abstract (English): Conventional document search techniques are constrained 
by attempting to match individual keywords or phrases to source documents. 
Thus, these techniques miss out documents that contain semantically similar 
terms, thereby achieving a relatively low degree of recall. At the same 
time, processing capabilities and tools for syntactic and semantic analysis 
of language have advanced to the point where an index-time linguistic 
analysis of source documents is both feasible and realistic. In this paper, 
we introduce document dimensions, a means of classifying or grouping terms 
discovered in documents. Using an enhanced version of Jakarta Lucene[1], we 
demonstrate that supplementing keyword analysis with some syntactic and 
semantic information can indeed enhance the quality of information 
retrieval results.
Publisher: Springer-Verlag
Publication Type: Original Paper
ISSN: 0302-9743
ISBN: 3-540-23659-7
Book DOI: 10.1007/b101591

2)
The Authors field consists of six authors

Title: Multilingual Retrieval Experiments with MIMOR at the University of 
Hildesheim
Authors: Hackl, Renéa; Kölle, Ralpha; Mandl, 
Thomasa(mandl@uni-hildesheim.de); Ploedt, Alexandraa; Scheufen, 
Jan-Hendrika; Womser-Hacker, Christaa
Affiliations: a. University of Hildesheim, Information Science, 
Marienburger Platz 22, D-31141 Hildesheim
Abstract (English): Fusion and optimization based relevance judgements have 
proven to be successful strategies in information retrieval. In this years 
CLEF campaign we applied these strategies to multilingual retrieval with 
four languages. Our fusion experiments were carried out using freely 
available software. We used the snowball stemmers, internet translation 
services and the text retrieval tools in Lucene and the new MySQL.
Publisher: Springer-Verlag
Publication Type: Original Paper
ISSN: 0302-9743
ISBN: 3-540-24017-9
Book DOI: 10.1007/b102261

3)

The Authors field consists of one author and only middle and first initial 
are provided

Title: Letter to the editor
Author: Coleman, S.S.a
Affiliations: a. Department of Orthopaedics, The University of Utah School 
of Medicine, 50 North Medical Drive, Salt Lake City, UT 84132, USA US
Abstract: No Abstract
Publisher: Springer-Verlag
Item Identifier: 10.1007/s002640000113
Publication Type: Article
ISSN: 0341-2695



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message