lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Why does this search succeed with web app, but not Luke?
Date Fri, 07 Aug 2009 10:21:55 GMT
It is a good general assumption that Luke is correct.

Can you confirm that you are using StandardAnalyzer everywhere, for
indexing and searching?  This sort of issue is often caused by using
different analyzers.

What does Luke show as the indexed terms for path?  In a little index
I've just created with StandardAnalyzer and file paths Luke is showing
xxx.yyy as a term and not xxx.  The opposite to what you have.

There was a thread yesterday about acronyms which might be relevant.
As might writing a tiny self-contained program that indexes a few
paths and displays the terms that have been indexed and runs a few
searches.


--
Ian.


On Fri, Aug 7, 2009 at 5:36 AM, <ohaya@cox.net> wrote:
> Hi Phil,
>
> Well, kind of... but...
>
> Then, why, when I do the search in Luke, do I get the results I cited:
>
> xxxx  ==> succeeds
>
> xxxx.yyy  ==> fails (no results)
>
> I guess that I've been assuming that the search in Luke is "correct" and I've been using
that to "test my understanding", but maybe that's an invalid assumption?
>
> Jim
>
>
>
>
>
> ---- Phil Whelan <phil123@gmail.com> wrote:
>> Hi Jim,
>>
>> > As I said, based on the terms in Luke, I would have expected a web app query
on:
>> >
>> > path:file-1-2
>> >
>> > to succeed, and a query on:
>> >
>> > path:file-1-2.dat
>> > to fail.
>> >
>> > But, instead both of those succeed when I do a web query.
>>
>> This query will also pass through the same (hopefully) Analyzer and
>> will be broken into terms. So the query will actually be for
>> "file-1-2" and "dat" where "file-1-2" is followed immediately by
>> "dat".
>>
>> In indexing the terms position is stored, so
>> "C:\dir1\dir2\file-1-1.dat" becomes...
>> [0] c
>> [1] dir1
>> [2] dir2
>> [3] file-1-1
>> [4] dat
>>
>> "file-1-1" is followed by "dat", so there is a match.
>>
>> Does that make sense?
>>
>> Cheers,
>> Phil
>>
>> >
>> > Jim
>> >
>> >
>> > ---- ohaya@cox.net wrote:
>> >> Phil,
>> >>
>> >> Both my indexer and the webapp are basically from the Lucene demos, the
indexer starting with the IndexFiles.java demo code, so I think they're both using the StandardAnalyzer.
>> >>
>> >> What appears in Luke, when I select "path" is just the filename part, without
the extension, i.e., the "xxxx" part.
>> >>
>> >> That's why I said in my original post that I was kind of surprised that
doing a web query for "path:xxxx.yyy" succeeded, i.e, in the path field in the index, there
is no "xxxx.yyy", just "xxxx".
>> >>
>> >> Jim
>> >>
>> >> ---- Phil Whelan <phil123@gmail.com> wrote:
>> >> > Hi Jim,
>> >> >
>> >> > Are you using the same Analyzer for indexing and searching? xxxx.yyy
>> >> > will be seem as a HOSTNAME by StandardAnalyzer and will keep it as
one
>> >> > term, whereas another indexer might split this into 2 terms. This
>> >> > should not matter either way as long as you are using the same
>> >> > Analyzer for both indexing and searching.
>> >> >
>> >> > I would expect this to pass unless you are using NOT_ANALYZED, or the
>> >> > WhitespaceAnalyzer, or something else that would not split on "/".
>> >> >     path:xxxx.yyy
>> >> >
>> >> > In Luke, do you see 2 terms "xxxx" and "yyy", or just "xxxx.yyy", or
>> >> > something else?
>> >> >
>> >> > Thanks,
>> >> > Phil
>> >> >
>> >> > On Thu, Aug 6, 2009 at 1:03 PM, <ohaya@cox.net> wrote:
>> >> > > Hi,
>> >> > >
>> >> > > In my indexer app (based on the IndexFiles.java demo), I am adding
the "path" field:
>> >> > >
>> >> > >    doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZED));
>> >> > >
>> >> > > Per Luke, the full path (e.g., "c:\....\xxxx.yyy") gets parsed,
and one of the terms (again, per Luke) is "xxxx", i.e., the actual file name, but without
the extension.
>> >> > >
>> >> > > Then, when I search with Luke for "path:xxxx", that succeeds,
as expected, and when I search with Luke for "path:xxxx.yyy", that fails, as expected.
>> >> > >
>> >> > > But, if I search using the demo web app, for "path:xxxx.yyy",
it succeeds.
>> >> > >
>> >> > > Since the Luke search for "path:xxxx.yyy" fails, I don't understand
why the web app search for "path:xxxx.yyy" would succeed?
>> >> > >
>> >> > > Thanks,
>> >> > > Jim
>> >> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message