lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Banga <iamrohitba...@gmail.com>
Subject Re: hit highlighting in lucene
Date Sun, 07 Feb 2010 12:12:17 GMT
it works!!! :)

could you also offer a suggestion for the following?

please have a look at the code above. it contains a list of cities that have
been added to the index.

// this is the code for indexing
    void indexCities() throws Exception {

        IndexWriter writer = new
IndexWriter(FSDirectory.open(index_directory),
                                new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
                                IndexWriter.MaxFieldLength.LIMITED);

        for (int i = 0; i < names.length; ++i) {
            Document doc = new Document();
            doc.add(new Field("name", names[i], Field.Store.YES,
Field.Index.ANALYZED));
            writer.addDocument(doc);
        }

        writer.optimize();
        writer.close();
    }

if i try
TermQuery tq = new FuzzyQuery(new Term("name","new delhi"));

i get a null because new and delhi are considered separately.

how should i change the analyzer to consider new delhi as a single term.
basically i am using lucene to find the names of all cities in the string.
because their may be spelling mistakes fuzzy matching works. that is i get
that the document with the closest matching city name as a top hit.
but since i also want to identify where in the query the match occurred, i
am using a hit highlighter. do i need to modify my analyzer to group new and
delhi into a phrase.

sorry for the noob question :(



On Sun, Feb 7, 2010 at 4:44 PM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> try
>      Query tq = new FuzzyQuery(new Term("name","mumbai"));
> instead of
>       TermQuery tq = new TermQuery(new Term("name","mumbai"));
>
> simon
>
> On Sun, Feb 7, 2010 at 11:58 AM, Rohit Banga <iamrohitbanga@gmail.com>
> wrote:
> >
> >     // list of cities that has been indexed
> >     // each city name is a document
> >     public static final String[] names = {"New Delhi", "Bangalore",
> > "Hyderabad",
> >                                           "Mumbai", "Chennai", "Kolkata",
> > "Ahmedabad",
> >                                           "Kanpur", "Guwahati",
> "Roorkee",
> > "Dehradun",
> >                                           "Lucknow", "Bhopal", "Jaipur",
> > "Jodhpur",
> >                                           "Thiruvanthapuram", "Jammu",
> > "Srinagar",
> >                                           "Raipur", "Pathankot",
> "Meerut",
> > "Muzaffarnagar",
> >                                           "Agra", "Jhansi",
> "Gandhinagar",
> > "Nasik", "Nagpur",
> >                                           "Calicut", "Trichi",
> "Bharatpur",
> > "Nainital"
> >                                          };
> >
> >     // i am using the standard analyzer
> >     void highLightWords(String qStr) throws Exception {
> >
> >         Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
> >         TokenStream stream = analyzer.tokenStream("name", new
> > StringReader(qStr));
> >
> >         TermQuery tq = new TermQuery(new Term("name","mumbai"));
> >         QueryScorer scorer = new QueryScorer(tq);
> >         Highlighter highlighter = new Highlighter(scorer);
> >
> >         String fragment = highlighter.getBestFragment(stream, qStr);
> >         System.out.println("\nfragment found: " + fragment);
> >     }
> >
> >
> > // invoking the above function
> > luceneTest.highLightWords("some unimportant text here Mumbai some
> > unimportant text there~");
> > fragment found: some unimportant text here <B>Mumbai</B> some unimportant
> > text there~
> >
> > but when i change mumbai to mumbhai
> > then while searching lucene does return hits for the correct document the
> > fragment is not found by the above function.
> >
> > luceneTest.highLightWords("some unimportant text here Mumbhai some
> > unimportant text there~");
> > fragment is null.
> >
> > On Sun, Feb 7, 2010 at 4:22 PM, Simon Willnauer
> > <simon.willnauer@googlemail.com> wrote:
> >>
> >> Rohit,
> >> what kind of problems are you facing with using fuzzy query and
> >> highlighting.
> >> could you give us more details and maybe a small code snipped which
> >> isolates you problem?
> >>
> >> simon
> >>
> >> On Sun, Feb 7, 2010 at 11:32 AM, Rohit Banga <iamrohitbanga@gmail.com>
> >> wrote:
> >> > but what about the case in which i am using fuzzy query matching. then
> >> > the
> >> > highlighter package does not work.
> >> >
> >> > On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler <uwe@thetaphi.de>
> wrote:
> >> >
> >> >> There are two contrib packages for highlighting in the lucene
> >> >> distribution:
> >> >> highlighter and fast-vector-highlighter
> >> >>
> >> >> -----
> >> >> Uwe Schindler
> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> http://www.thetaphi.de
> >> >> eMail: uwe@thetaphi.de
> >> >>
> >> >>
> >> >> > -----Original Message-----
> >> >> > From: Rohit Banga [mailto:iamrohitbanga@gmail.com]
> >> >> > Sent: Saturday, February 06, 2010 2:27 PM
> >> >> > To: java-user@lucene.apache.org
> >> >> > Subject: hit highlighting in lucene
> >> >> >
> >> >> > Hi friends
> >> >> >
> >> >> > I have just started using lucene and the way i want to use it
is
> the
> >> >> > following:
> >> >> >
> >> >> > i have documents consisting of names of users as one field.
> >> >> > i have a sentence that may contain the name of some user.
> >> >> > i perform a search for the sentence in the index using the
> searcher.
> >> >> > if it contains the name of the user, then that user's document
is
> >> >> > listed on
> >> >> > top by lucene.
> >> >> >
> >> >> > now i want to determine the position in the sentence where the
> string
> >> >> > has
> >> >> > been found.
> >> >> >
> >> >> > i am using fuzzy query matching by adding the character '~' to
the
> >> >> > sentence
> >> >> > i am searching.
> >> >> > so this means i cannot use the find function of the String class
as
> >> >> > is
> >> >> > to
> >> >> > get the position of the match.
> >> >> >
> >> >> > Thanks in advance
> >> >> >
> >> >> > --
> >> >> > Rohit Banga
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Rohit Banga
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> >
> > --
> > Rohit Banga
> >
>



-- 
Rohit Banga

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message