lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: wildcard search not working on file paths
Date Mon, 14 Oct 2013 19:40:37 GMT
You seem to be indexing paths delimited by backslash then saying a
search for Samples/* doesn't match anything.  No surprises there, if
I've read your code correctly.  Since you are creating wildcard
queries directly from Terms I don't think that lucene escaping is
relevant here,  But the presence of all the backslashes in paths and
java code doesn't help.  I'd convert them all to standard unix /a/b/c
format, for searching anyway: you can always store the original if you
want to use that in results.

One further small tip: your sample program is good, with no external
dependencies, but would be even better if you used RAMDirectory.  That
way I could run it on my non-Windows system if I wanted to, with the
addition of some imports.


--
Ian.


On Mon, Oct 14, 2013 at 7:55 PM, nischal reddy
<nischal.srinivas@gmail.com> wrote:
> Hi Ian,
>
> Please find a sample program below which better illustrates the scenario
>
>
> public class TestWriter {
>     public static void main(String[] args) throws IOException {
>         createIndex();
>         searchIndex();
>     }
>
>     public static void createIndex() throws IOException {
>             Directory directory = FSDirectory.open(new File("C:\\temp"));
>
>             IndexWriterConfig iwriter = new IndexWriterConfig(
>                     Version.LUCENE_44, new
> StandardAnalyzer(Version.LUCENE_44));
>
>             IndexWriter iWriter = new IndexWriter(directory, iwriter);
>
>             Document document1 = new Document();
>
>             document1.add(new StringField("FILE_PATH",
>                     "\\Samples\\Batching\\runner.p", Store.YES));
>             document1.add(new StringField("contents", "runnerfile",
> Store.YES));
>
>             iWriter.addDocument(document1);
>
>             Document document2 = new Document();
>
>             document2.add(new StringField("FILE_PATH",
>                     "\\Samples\\Business\\stopper.p", Store.YES));
>             document2
>                     .add(new StringField("contents", "stopperfile",
> Store.YES));
>
>             iWriter.addDocument(document2);
>             iWriter.commit();
>             iWriter.close();
>
>
>     }
>
>     public static void searchIndex() throws IOException {
>
>         Directory directory = FSDirectory.open(new File("C:\\temp"));
>         IndexReader indexReader = DirectoryReader.open(directory);
>         IndexSearcher indexSearcher = new IndexSearcher(indexReader);
>
>         // Create a wildcard query to get all file paths
>         // This query works fine and returns all the docs in index
>         Query query1 = new WildcardQuery(new Term("FILE_PATH", "*"));
>         TopDocs topDocs = indexSearcher.search(query1, 100);
>         System.out.println("total no of docs " + topDocs.totalHits);
>
>         // Create a wildcard query to search for paths starting with
> /Samples
>         // This query doesnt work and returns zero docs
>         //doest work with "*Samples//*" either
>         // but works with "*Samples*"
>         Query query2 = new WildcardQuery(new Term("FILE_PATH",
> "*Samples/*"));
>         TopDocs topDocs2 = indexSearcher.search(query2, 100);
>         System.out.println("total no of docs " + topDocs2.totalHits);
>
>         // Create a wildcard query to search for paths ending with runner.p
>         // This query works and returns 1 doc
>         Query query3 = new WildcardQuery(new Term("FILE_PATH",
> "*runner.p"));
>         TopDocs topDocs3 = indexSearcher.search(query3, 100);
>         System.out.println("total no of docs " + topDocs3.totalHits);
>
>         // Queries to search in "contents" field
>
>         // Create a wildcard query to search for contents starting with
> runner
>         // This query works and returns one doc
>         Query query4 = new WildcardQuery(new Term("contents", "runner*"));
>         TopDocs topDocs4 = indexSearcher.search(query4, 100);
>         System.out.println("total no of docs " + topDocs4.totalHits);
>
>         // Create a wildcard query to search for contents ending with file
>         // This query works and returns two  docs
>         Query query5 = new WildcardQuery(new Term("contents", "*file"));
>         TopDocs topDocs5 = indexSearcher.search(query5, 100);
>         System.out.println("total no of docs " + topDocs5.totalHits);
>
>     }
>
> }
>
>
> I observed that the file path seperator that i am using in the field and
> lucene escape charater seem to be same. so whenever i am using a escape
> character in the query the search is failing, if i dont use the escape
> sequence it is returning the results properly.
>
> Though i am escaping "\" by giving two "\\" the query is still failing.
>
> one way to solve this problem is to replace all "\" with "/" while
> indexing. and subsequently using "/" as file path seperator while searching.
>
> But i wouldnt prefer to meddle with the filepath. So is there any
> alternative to solve this problem without replacing the file path.
>
> TIA,
> Nischal Y
>
>
>
> On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> Seems to me that it should work.  I suggest you show us a complete
>> self-contained example program that demonstrates the problem.
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
>> <nischal.srinivas@gmail.com> wrote:
>> > Hi Ian,
>> >
>> > Actually im able to do wildcard searches on all the fields except the
>> > "filePath" field. I am able to do both the leading and trailing wildcard
>> > searches on all the fields,
>> > but when i do the wildcard search on filepath field it is somehow not
>> > working, an eg file path would look some thing like this
>> "\Samples\F1.cls"
>> > i think because of "\" present in the field it is failing. when i do a
>> > wildcard search with the query "filePath : *" it is indeed returning all
>> > the docs in the index. But when i do any other wildcard searches(leading
>> or
>> > trailing) it is not working, any clues why it is working in other fields
>> > and not working on "filePath" field.
>> >
>> > TIA,
>> > Nischal Y
>> >
>> >
>> > On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea <ian.lea@gmail.com> wrote:
>> >
>> >> Do some googling on leading wildcards and read things like
>> >> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
>> >> an option you like.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
>> >> <nischal.srinivas@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I have problem with doing wild card search on file path fields.
>> >> >
>> >> > i have a field "filePath" where i store complete path of files.
>> >> >
>> >> > i have used StringField to store the field ("i assume by default
>> >> > StringField will not be tokenized") .
>> >> >
>> >> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
>> >> >
>> >> > I am using StandardAnalyzer for IndexWriter
>> >> >
>> >> > but since i am using a StringField the fields are not analyzed.
>> >> >
>> >> > After the files are indexed i checked it with Luke the path seems
>> fine.
>> >> And
>> >> > when i do wildcard searches with luke i am getting desired results.
>> >> >
>> >> > But when i do the same search in my code with IndexSearcher i am
>> getting
>> >> > zero docs
>> >> >
>> >> > My searching code looks something like this
>> >> >
>> >> > indexSearcher.search(new WildcardQuery(new
>> >> > Term("filePath","*SuperClass.cls")),100);
>> >> >
>> >> > this is returning zero documents.
>> >> >
>> >> > But when i just use "*" in query it is returning all the documents
>> >> >
>> >> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
>> >> >
>> >> > only when i use some queries like prefix wildcard etc it is not
>> working
>> >> >
>> >> > What is possibly going wrong.
>> >> >
>> >> > Thanks,
>> >> > Nischal Y
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message