lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Weird results with appendable fields
Date Sat, 15 Mar 2008 21:21:40 GMT
Are you using  WhitespaceAnalyzer BOTH at index and search time? If not
that is one problem, especially since, for instance, there is a comma after
Timaran,.

The two best things I've found are
1> make sure you have a copy of Luke and use it to examine your
index and see if what you *think* your index contains is actually what
it contains. You can also use Luke to see how queries parse using various
analyzers.

2> use query.toString() to see what your query looks like when you submit
it.

If neither of these help, could you post the briefest possible example that
illustrates the problem?

Best
Erick

On Thu, Mar 13, 2008 at 4:48 AM, Gustavo Corral <gustavo.corral@gmail.com>
wrote:

> Hi list,
>
> I'm new in Lucene and I'm trying to index a set of XML documents
> (document-centric) with the same structure. All this documents have a
> header, a front, and a body (where there's a lot of text).
>
> The problem is that in the header I have two fields author and title, but
> one document can have more than one author, so I tried to index as
> appendable field in this way:
>
> ArrayList <String> authors = front.getAuthors();
>
> for(Iterator <String> it = authors.iterator(); it.hasNext();){ If
>    String out (String) it.next();
>    if((aut != null) && !aut.equals("")){
>         doc.add(new Field("author",aut,Field.StoreYES,
> Field.Index.TOKENIZED
> ));
>    }
> }
>
> and I was searching in my index with Lukeand I obtained rare results. For
> example: There's a document with 3 authors which appears as appendable
> fields in the index this way: Freddy Pantoja Timaran, Ph.D. Gabriel
> Pantoja
> Barrios Jorge Ivan Londoño.
>
> The thing is that when I search in Luke for Freddy, Pantoja, Gabriel,
> Barrios, Iván (all in a different query) i got this document as a Hit,
> that's correct, but when I search for Timaran, Londoño I get no Hits,
> which
> is not correct.
>
> I'm using by now WhiteSpaceAnalyzer. Any idea???
>
> Thanks
> Gustavo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message