lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nischal reddy <nischal.srini...@gmail.com>
Subject Re: wildcard search not working on file paths
Date Mon, 14 Oct 2013 18:55:16 GMT
Hi Ian,

Please find a sample program below which better illustrates the scenario


public class TestWriter {
    public static void main(String[] args) throws IOException {
        createIndex();
        searchIndex();
    }

    public static void createIndex() throws IOException {
            Directory directory = FSDirectory.open(new File("C:\\temp"));

            IndexWriterConfig iwriter = new IndexWriterConfig(
                    Version.LUCENE_44, new
StandardAnalyzer(Version.LUCENE_44));

            IndexWriter iWriter = new IndexWriter(directory, iwriter);

            Document document1 = new Document();

            document1.add(new StringField("FILE_PATH",
                    "\\Samples\\Batching\\runner.p", Store.YES));
            document1.add(new StringField("contents", "runnerfile",
Store.YES));

            iWriter.addDocument(document1);

            Document document2 = new Document();

            document2.add(new StringField("FILE_PATH",
                    "\\Samples\\Business\\stopper.p", Store.YES));
            document2
                    .add(new StringField("contents", "stopperfile",
Store.YES));

            iWriter.addDocument(document2);
            iWriter.commit();
            iWriter.close();


    }

    public static void searchIndex() throws IOException {

        Directory directory = FSDirectory.open(new File("C:\\temp"));
        IndexReader indexReader = DirectoryReader.open(directory);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);

        // Create a wildcard query to get all file paths
        // This query works fine and returns all the docs in index
        Query query1 = new WildcardQuery(new Term("FILE_PATH", "*"));
        TopDocs topDocs = indexSearcher.search(query1, 100);
        System.out.println("total no of docs " + topDocs.totalHits);

        // Create a wildcard query to search for paths starting with
/Samples
        // This query doesnt work and returns zero docs
        //doest work with "*Samples//*" either
        // but works with "*Samples*"
        Query query2 = new WildcardQuery(new Term("FILE_PATH",
"*Samples/*"));
        TopDocs topDocs2 = indexSearcher.search(query2, 100);
        System.out.println("total no of docs " + topDocs2.totalHits);

        // Create a wildcard query to search for paths ending with runner.p
        // This query works and returns 1 doc
        Query query3 = new WildcardQuery(new Term("FILE_PATH",
"*runner.p"));
        TopDocs topDocs3 = indexSearcher.search(query3, 100);
        System.out.println("total no of docs " + topDocs3.totalHits);

        // Queries to search in "contents" field

        // Create a wildcard query to search for contents starting with
runner
        // This query works and returns one doc
        Query query4 = new WildcardQuery(new Term("contents", "runner*"));
        TopDocs topDocs4 = indexSearcher.search(query4, 100);
        System.out.println("total no of docs " + topDocs4.totalHits);

        // Create a wildcard query to search for contents ending with file
        // This query works and returns two  docs
        Query query5 = new WildcardQuery(new Term("contents", "*file"));
        TopDocs topDocs5 = indexSearcher.search(query5, 100);
        System.out.println("total no of docs " + topDocs5.totalHits);

    }

}


I observed that the file path seperator that i am using in the field and
lucene escape charater seem to be same. so whenever i am using a escape
character in the query the search is failing, if i dont use the escape
sequence it is returning the results properly.

Though i am escaping "\" by giving two "\\" the query is still failing.

one way to solve this problem is to replace all "\" with "/" while
indexing. and subsequently using "/" as file path seperator while searching.

But i wouldnt prefer to meddle with the filepath. So is there any
alternative to solve this problem without replacing the file path.

TIA,
Nischal Y



On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea <ian.lea@gmail.com> wrote:

> Seems to me that it should work.  I suggest you show us a complete
> self-contained example program that demonstrates the problem.
>
>
> --
> Ian.
>
>
> On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
> <nischal.srinivas@gmail.com> wrote:
> > Hi Ian,
> >
> > Actually im able to do wildcard searches on all the fields except the
> > "filePath" field. I am able to do both the leading and trailing wildcard
> > searches on all the fields,
> > but when i do the wildcard search on filepath field it is somehow not
> > working, an eg file path would look some thing like this
> "\Samples\F1.cls"
> > i think because of "\" present in the field it is failing. when i do a
> > wildcard search with the query "filePath : *" it is indeed returning all
> > the docs in the index. But when i do any other wildcard searches(leading
> or
> > trailing) it is not working, any clues why it is working in other fields
> > and not working on "filePath" field.
> >
> > TIA,
> > Nischal Y
> >
> >
> > On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea <ian.lea@gmail.com> wrote:
> >
> >> Do some googling on leading wildcards and read things like
> >> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
> >> an option you like.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
> >> <nischal.srinivas@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I have problem with doing wild card search on file path fields.
> >> >
> >> > i have a field "filePath" where i store complete path of files.
> >> >
> >> > i have used StringField to store the field ("i assume by default
> >> > StringField will not be tokenized") .
> >> >
> >> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
> >> >
> >> > I am using StandardAnalyzer for IndexWriter
> >> >
> >> > but since i am using a StringField the fields are not analyzed.
> >> >
> >> > After the files are indexed i checked it with Luke the path seems
> fine.
> >> And
> >> > when i do wildcard searches with luke i am getting desired results.
> >> >
> >> > But when i do the same search in my code with IndexSearcher i am
> getting
> >> > zero docs
> >> >
> >> > My searching code looks something like this
> >> >
> >> > indexSearcher.search(new WildcardQuery(new
> >> > Term("filePath","*SuperClass.cls")),100);
> >> >
> >> > this is returning zero documents.
> >> >
> >> > But when i just use "*" in query it is returning all the documents
> >> >
> >> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
> >> >
> >> > only when i use some queries like prefix wildcard etc it is not
> working
> >> >
> >> > What is possibly going wrong.
> >> >
> >> > Thanks,
> >> > Nischal Y
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message