lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject Re: Why does this search succeed with web app, but not Luke?
Date Fri, 07 Aug 2009 01:27:25 GMT
Phil,

I need to be more precise...

The files that I have are at, say:

C:\dir1\dir2\

so, for example, I have

C:\dir1\dir2\file-1-1.dat
C:\dir1\dir2\file-1-2.dat
C:\dir1\dir2\file-1-3.dat
C:\dir1\dir2\file-1-4.dat
C:\dir1\dir2\file-1-5.dat

After indexing, and, using Luke, I look at the "path" field, and I see terms like (not in
this order):

C
dir1
dir2
dat
file-1-1
file-1-2
file-1-3
file-1-4
file-1-5

Notice that there is no (for example) term like "file-1-2.dat".

I'm assuming this is because the analyzer is breaking that into "file-1-2" and "dat".

As I said, based on the terms in Luke, I would have expected a web app query on:

path:file-1-2

to succeed, and a query on:

path:file-1-2.dat

to fail.

But, instead both of those succeed when I do a web query.

Jim


---- ohaya@cox.net wrote: 
> Phil,
> 
> Both my indexer and the webapp are basically from the Lucene demos, the indexer starting
with the IndexFiles.java demo code, so I think they're both using the StandardAnalyzer.
> 
> What appears in Luke, when I select "path" is just the filename part, without the extension,
i.e., the "xxxx" part.  
> 
> That's why I said in my original post that I was kind of surprised that doing a web query
for "path:xxxx.yyy" succeeded, i.e, in the path field in the index, there is no "xxxx.yyy",
just "xxxx".
> 
> Jim
> 
> ---- Phil Whelan <phil123@gmail.com> wrote: 
> > Hi Jim,
> > 
> > Are you using the same Analyzer for indexing and searching? xxxx.yyy
> > will be seem as a HOSTNAME by StandardAnalyzer and will keep it as one
> > term, whereas another indexer might split this into 2 terms. This
> > should not matter either way as long as you are using the same
> > Analyzer for both indexing and searching.
> > 
> > I would expect this to pass unless you are using NOT_ANALYZED, or the
> > WhitespaceAnalyzer, or something else that would not split on "/".
> >     path:xxxx.yyy
> > 
> > In Luke, do you see 2 terms "xxxx" and "yyy", or just "xxxx.yyy", or
> > something else?
> > 
> > Thanks,
> > Phil
> > 
> > On Thu, Aug 6, 2009 at 1:03 PM, <ohaya@cox.net> wrote:
> > > Hi,
> > >
> > > In my indexer app (based on the IndexFiles.java demo), I am adding the "path"
field:
> > >
> > >    doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZED));
> > >
> > > Per Luke, the full path (e.g., "c:\....\xxxx.yyy") gets parsed, and one of
the terms (again, per Luke) is "xxxx", i.e., the actual file name, but without the extension.
> > >
> > > Then, when I search with Luke for "path:xxxx", that succeeds, as expected,
and when I search with Luke for "path:xxxx.yyy", that fails, as expected.
> > >
> > > But, if I search using the demo web app, for "path:xxxx.yyy", it succeeds.
> > >
> > > Since the Luke search for "path:xxxx.yyy" fails, I don't understand why the
web app search for "path:xxxx.yyy" would succeed?
> > >
> > > Thanks,
> > > Jim
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message