lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject Re: Why does this search succeed with web app, but not Luke?
Date Thu, 01 Jan 1970 00:00:00 GMT
Ian,

I just re-confirmed that StandardAnalyzer is used in both my indexer app and in the query/search
web app.

The actual file paths look like:

C:\lucene-devel\dat\xxxxxxxxxxxxxxxx.dat
or
C:\lucene-devel\data\testdir\\xxxxxxxxxxxxxxxx.dat

For field "path", Luke shows:

lucene
data
c
devel
dat
testdir
xxxxxxxxxxxxxxxxxxxxxxxxxxx
.
.
zzzzzzzzzzzzzzzzzzzzzzzzzzzz


where "xxxxxxxxxxxxxxxx" and "zzzzzzzzzzzzzzzzz" are the left part (to the left of the ".")
of filenames.

So, it seems like you're correct, that what you're seeing is the opposite from what I'm seeing
:(??

Again, the actual code in my indexer has:

 doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZED));

(and again, the indexer uses StandardAnalyzer).

Is that different from what you did in your "little" index test?

Jim




---- Ian Lea <ian.lea@gmail.com> wrote: 
> It is a good general assumption that Luke is correct.
> 
> Can you confirm that you are using StandardAnalyzer everywhere, for
> indexing and searching?  This sort of issue is often caused by using
> different analyzers.
> 
> What does Luke show as the indexed terms for path?  In a little index
> I've just created with StandardAnalyzer and file paths Luke is showing
> xxx.yyy as a term and not xxx.  The opposite to what you have.
> 
> There was a thread yesterday about acronyms which might be relevant.
> As might writing a tiny self-contained program that indexes a few
> paths and displays the terms that have been indexed and runs a few
> searches.
> 
> 
> --
> Ian.
> 
> 
> On Fri, Aug 7, 2009 at 5:36 AM, <ohaya@cox.net> wrote:
> > Hi Phil,
> >
> > Well, kind of... but...
> >
> > Then, why, when I do the search in Luke, do I get the results I cited:
> >
> > xxxx  ==> succeeds
> >
> > xxxx.yyy  ==> fails (no results)
> >
> > I guess that I've been assuming that the search in Luke is "correct" and I've been
using that to "test my understanding", but maybe that's an invalid assumption?
> >
> > Jim
> >
> >
> >
> >
> >
> > ---- Phil Whelan <phil123@gmail.com> wrote:
> >> Hi Jim,
> >>
> >> > As I said, based on the terms in Luke, I would have expected a web app
query on:
> >> >
> >> > path:file-1-2
> >> >
> >> > to succeed, and a query on:
> >> >
> >> > path:file-1-2.dat
> >> > to fail.
> >> >
> >> > But, instead both of those succeed when I do a web query.
> >>
> >> This query will also pass through the same (hopefully) Analyzer and
> >> will be broken into terms. So the query will actually be for
> >> "file-1-2" and "dat" where "file-1-2" is followed immediately by
> >> "dat".
> >>
> >> In indexing the terms position is stored, so
> >> "C:\dir1\dir2\file-1-1.dat" becomes...
> >> [0] c
> >> [1] dir1
> >> [2] dir2
> >> [3] file-1-1
> >> [4] dat
> >>
> >> "file-1-1" is followed by "dat", so there is a match.
> >>
> >> Does that make sense?
> >>
> >> Cheers,
> >> Phil
> >>
> >> >
> >> > Jim
> >> >
> >> >
> >> > ---- ohaya@cox.net wrote:
> >> >> Phil,
> >> >>
> >> >> Both my indexer and the webapp are basically from the Lucene demos,
the indexer starting with the IndexFiles.java demo code, so I think they're both using the
StandardAnalyzer.
> >> >>
> >> >> What appears in Luke, when I select "path" is just the filename part,
without the extension, i.e., the "xxxx" part.
> >> >>
> >> >> That's why I said in my original post that I was kind of surprised
that doing a web query for "path:xxxx.yyy" succeeded, i.e, in the path field in the index,
there is no "xxxx.yyy", just "xxxx".
> >> >>
> >> >> Jim
> >> >>
> >> >> ---- Phil Whelan <phil123@gmail.com> wrote:
> >> >> > Hi Jim,
> >> >> >
> >> >> > Are you using the same Analyzer for indexing and searching? xxxx.yyy
> >> >> > will be seem as a HOSTNAME by StandardAnalyzer and will keep it
as one
> >> >> > term, whereas another indexer might split this into 2 terms. This
> >> >> > should not matter either way as long as you are using the same
> >> >> > Analyzer for both indexing and searching.
> >> >> >
> >> >> > I would expect this to pass unless you are using NOT_ANALYZED,
or the
> >> >> > WhitespaceAnalyzer, or something else that would not split on
"/".
> >> >> >     path:xxxx.yyy
> >> >> >
> >> >> > In Luke, do you see 2 terms "xxxx" and "yyy", or just "xxxx.yyy",
or
> >> >> > something else?
> >> >> >
> >> >> > Thanks,
> >> >> > Phil
> >> >> >
> >> >> > On Thu, Aug 6, 2009 at 1:03 PM, <ohaya@cox.net> wrote:
> >> >> > > Hi,
> >> >> > >
> >> >> > > In my indexer app (based on the IndexFiles.java demo), I
am adding the "path" field:
> >> >> > >
> >> >> > >    doc.add(new Field("path", f.getPath(), Field.Store.YES,
Field.Index.ANALYZED));
> >> >> > >
> >> >> > > Per Luke, the full path (e.g., "c:\....\xxxx.yyy") gets parsed,
and one of the terms (again, per Luke) is "xxxx", i.e., the actual file name, but without
the extension.
> >> >> > >
> >> >> > > Then, when I search with Luke for "path:xxxx", that succeeds,
as expected, and when I search with Luke for "path:xxxx.yyy", that fails, as expected.
> >> >> > >
> >> >> > > But, if I search using the demo web app, for "path:xxxx.yyy",
it succeeds.
> >> >> > >
> >> >> > > Since the Luke search for "path:xxxx.yyy" fails, I don't
understand why the web app search for "path:xxxx.yyy" would succeed?
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Jim
> >> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message