lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Need help in alphanumeric search
Date Fri, 02 Oct 2015 03:48:50 GMT
Phrase query for a tokenized text field should do it.

-- Jack Krupansky

On Thu, Oct 1, 2015 at 10:04 PM, Bhaskar <bhaskar1484@gmail.com> wrote:

> Hi Jack,
>
> my searching is working like this.
>
> if i give input as "SD RAM Bhaskar" then which ever strings are having
> "SD", "RAM", "Bhaskar" all results are coming .
>
> i.e. "SD lib"
>
>       "RAM hello"
>
>       "hi Bhaskar "
>
>       "Bhaskar hai SD"
>
> But I want below output.
>
>        "SD RAM Bhaskar"
>
>        "SD RAM Bhaskar hello"
>
> i.e in the begining string have "SD RAM Bhaskar"  then next string can be
> any thing.
>
> but my current application result where ever it finds the "SD", or "RAM",
> or "Bhaskar" I am getting all the string.
>
> Regards,
> Bhaskar
> On Oct 2, 2015 1:19 AM, "Jack Krupansky" <jack.krupansky@gmail.com> wrote:
>
> > Technically, there is no such thing as a "sentence search" in Lucene.
> > Please provide an example of how you wish to search, and then we can
> > determine whether a phrase query or a span query might accomplish the
> task.
> >
> > -- Jack Krupansky
> >
> > On Thu, Oct 1, 2015 at 11:53 AM, Bhaskar <bhaskar1484@gmail.com> wrote:
> >
> > > Hi,
> > > I am looking for sentence search rather than word search.
> > > Regards,
> > > Bhaskar
> > > On Oct 1, 2015 7:07 PM, "Ian Lea" <ian.lea@gmail.com> wrote:
> > >
> > > > Take a look at
> > > >
> > >
> >
> http://lucene.apache.org/core/5_3_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description
> > > > .
> > > > Sounds like you want an AND, or a +, or both. You may also want to
> > > > take a look at phrase queries and/or span queries.
> > > >
> > > >
> > > > --
> > > > Ian.
> > > >
> > > >
> > > >
> > > > --
> > > > Ian.
> > > >
> > > >
> > > > On Thu, Oct 1, 2015 at 1:52 PM, Bhaskar <bhaskar1484@gmail.com>
> wrote:
> > > > > Hi Uwe,
> > > > > my searching is working like this.
> > > > > if i give input as "SD RAM Bhaskar" then which ever strings are
> > having
> > > > > "SD", "RAM", "Bhaskar" all results are coming .
> > > > > i.e. "SD lib"
> > > > >       "RAM hello"
> > > > >       "hi Bhaskar "
> > > > >       "Bhaskar hai SD"
> > > > >
> > > > >
> > > > > But I want below output.
> > > > >        "SD RAM Bhaskar"
> > > > >        "SD RAM Bhaskar hello"
> > > > > i.e in the begining string have "SD RAM Bhaskar"  then next string
> > can
> > > be
> > > > > any thing.
> > > > >
> > > > >
> > > > > but my current application result where ever it finds the "SD", or
> > > "RAM",
> > > > > or "Bhaskar" I am getting all the string.
> > > > >
> > > > >
> > > > > Can you please advice?
> > > > > Thanks a lot in advance.
> > > > >
> > > > > Regards,
> > > > > Bhaskar
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 30, 2015 at 12:23 PM, Uwe Schindler <uwe@thetaphi.de>
> > > wrote:
> > > > >
> > > > >> Hi Bhaskar,
> > > > >>
> > > > >> the answer is very simple: Your analysis is not useful for the
> type
> > of
> > > > >> queries and data you are using. You are using SimpleAnalyzer
in
> your
> > > > >> search/indexing code:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
> > > > >> "An Analyzer that filters LetterTokenizer with LowerCaseFilter"
> > > > >>
> > > > >> And LetterTokenizer does the following:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
> > > > >> "A LetterTokenizer is a tokenizer that divides text at
> non-letters.
> > > > That's
> > > > >> to say, it defines tokens as maximal strings of adjacent letters,
> as
> > > > >> defined by java.lang.Character.isLetter() predicate."
> > > > >>
> > > > >> So it creates a new token at every non-letter boundary. All
> > > non-letters
> > > > >> are discarded (because they are treated as token boundary). So
> your
> > > > queries
> > > > >> can never match.
> > > > >>
> > > > >> I'd suggest to first inform yourself about analysis and choose
a
> > > better
> > > > >> one that suits your underlying data and the queries you want
to
> do.
> > > > Maybe
> > > > >> use WhitespaceAnalyzer or better StandardAnalyzer as a first
step.
> > Be
> > > > sure
> > > > >> to reindex your data before querying. The Analyzer used on the
> > search
> > > > side
> > > > >> must be the same like on the query side. If you want to use
> > wildcards,
> > > > you
> > > > >> have to take care more, because wildcards are not really natural
> for
> > > > "full
> > > > >> text search engine" and may cause inconsistent results.
> > > > >>
> > > > >> Uwe
> > > > >>
> > > > >> -----
> > > > >> Uwe Schindler
> > > > >> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > >> http://www.thetaphi.de
> > > > >> eMail: uwe@thetaphi.de
> > > > >>
> > > > >> > -----Original Message-----
> > > > >> > From: Bhaskar [mailto:bhaskar1484@gmail.com]
> > > > >> > Sent: Wednesday, September 30, 2015 4:28 AM
> > > > >> > To: java-user@lucene.apache.org
> > > > >> > Subject: Re: Need help in alphanumeric search
> > > > >> >
> > > > >> > Hi Uwe,
> > > > >> >
> > > > >> > Below is my indexing code:
> > > > >> >
> > > > >> > public static void main(String[] args) throws Exception
{ //Path
> > > > >> indexDir =
> > > > >> > new Path(INDEX_DIR); public static final String INDEX_DIR
=
> > > > >> "c:/DBIndexAll/";
> > > > >> > final Path indexDir = Paths.get(INDEX_DIR); SimpleDBIndexer
> > indexer
> > > =
> > > > new
> > > > >> > SimpleDBIndexer(); try{
> > > > >> >    Class.forName(JDBC_DRIVER).newInstance();
> > > > >> >    Connection conn = DriverManager.getConnection(CONNECTION_URL
> +
> > > > >> > DBNAME, USER_NAME, PASSWORD);
> > > > >> >    SimpleAnalyzer analyzer = new SimpleAnalyzer();
> > > > >> >    IndexWriterConfig indexWriterConfig = new
> > > > IndexWriterConfig(analyzer);
> > > > >> >    IndexWriter indexWriter = new
> > > > IndexWriter(FSDirectory.open(indexDir),
> > > > >> > indexWriterConfig);
> > > > >> >    System.out.println("Indexing to directory '" + indexDir
+
> > > "'...");
> > > > >> >    int indexedDocumentCount = indexer.indexDocs(indexWriter,
> > conn);
> > > > >> >    indexWriter.close();
> > > > >> >    System.out.println(indexedDocumentCount + " records have
been
> > > > indexed
> > > > >> > successfully"); } catch (Exception e) {
> > > > >> >    e.printStackTrace();
> > > > >> > }
> > > > >> > }
> > > > >> >
> > > > >> > int indexDocs(IndexWriter writer, Connection conn) throws
> > Exception
> > > {
> > > > >> >   String sql = QUERY1;
> > > > >> >   Statement stmt = conn.createStatement();
> > > > >> >   ResultSet rs = stmt.executeQuery(sql);
> > > > >> >   int i=0;
> > > > >> >   while (rs.next()) {
> > > > >> >      Document d = new Document();
> > > > >> >      d.add(new TextField("cpn", rs.getString("cpn"),
> > > > Field.Store.YES));
> > > > >> >
> > > > >> >      writer.addDocument(d);
> > > > >> >      i++;
> > > > >> >  }
> > > > >> >   stmt.close();
> > > > >> >   rs.close();
> > > > >> >
> > > > >> >   return i;
> > > > >> > }
> > > > >> >
> > > > >> >
> > > > >> > Searching code:
> > > > >> >
> > > > >> > public class SimpleDBSearcher {
> > > > >> > // PLASTRON
> > > > >> > private static final String LUCENE_QUERY = "SD*"; private
static
> > > final
> > > > >> int
> > > > >> > MAX_HITS = 500; private static final String INDEX_DIR =
> > > > "C:/DBIndexAll/";
> > > > >> >
> > > > >> > public static void main(String[] args) throws Exception
{ //
> File
> > > > >> indexDir = new
> > > > >> > File(SimpleDBIndexer.INDEX_DIR); final Path indexDir =
> > > > >> > Paths.get(SimpleDBIndexer.INDEX_DIR);
> > > > >> > String query = LUCENE_QUERY;
> > > > >> > SimpleDBSearcher searcher = new SimpleDBSearcher();
> > > > >> > searcher.searchIndex(indexDir, query); }
> > > > >> >
> > > > >> > private void searchIndex(Path indexDir, String queryStr)
throws
> > > > >> Exception {
> > > > >> > Directory directory = FSDirectory.open(indexDir);
> > > > System.out.println("The
> > > > >> > query string is " + queryStr); MultiFieldQueryParser
> queryParser =
> > > new
> > > > >> > MultiFieldQueryParser(new String[] { "cpn" }, new
> > > StandardAnalyzer());
> > > > >> > IndexReader reader = DirectoryReader.open(directory);
> > IndexSearcher
> > > > >> > searcher = new IndexSearcher(reader);
> > > > >> > queryParser.getAllowLeadingWildcard();
> > > > >> >
> > > > >> > Query query = queryParser.parse(queryStr); TopDocs topDocs
=
> > > > >> > searcher.search(query, MAX_HITS);
> > > > >> >
> > > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
> > > > >> > System.out.println(hits.length + " Record(s) Found"); for
(int
> i =
> > > 0;
> > > > i <
> > > > >> > hits.length; i++) { int docId = hits[i].doc; Document d
=
> > > > >> searcher.doc(docId);
> > > > >> > System.out.println("\"cpn value is:\" " + d.get("cpn"));
} if
> > > > >> (hits.length == 0) {
> > > > >> > System.out.println("No Data Founds "); }
> > > > >> >
> > > > >> > }
> > > > >> > }
> > > > >> >
> > > > >> >
> > > > >> > Please help here, thanks in advance.
> > > > >> >
> > > > >> > Regards,
> > > > >> > Bhaskar
> > > > >> >
> > > > >> > On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler <uwe@thetaphi.de
> >
> > > > wrote:
> > > > >> >
> > > > >> > > Hi Erick,
> > > > >> > >
> > > > >> > > This mail was in Lucene's user mailing list. This is
not about
> > > Solr,
> > > > >> > > so user cannot provide his Solr config! :-) In any
case, it
> > would
> > > be
> > > > >> > > good to get the Analyzer + code you use while indexing
and
> also
> > > the
> > > > >> > > code (+ Analyzer) that creates the query while searching.
> > > > >> > >
> > > > >> > > Uwe
> > > > >> > >
> > > > >> > > -----
> > > > >> > > Uwe Schindler
> > > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > >> > > http://www.thetaphi.de
> > > > >> > > eMail: uwe@thetaphi.de
> > > > >> > >
> > > > >> > >
> > > > >> > > > -----Original Message-----
> > > > >> > > > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > > > >> > > > Sent: Monday, September 28, 2015 6:01 PM
> > > > >> > > > To: java-user
> > > > >> > > > Subject: Re: Need help in alphanumeric search
> > > > >> > > >
> > > > >> > > > You need to supply the definitions of this field
from your
> > > > >> > > > schema.xml
> > > > >> > > file,
> > > > >> > > > both the <field> and <fieldType>
> > > > >> > > >
> > > > >> > > > Additionally, please provide the results of the
query you're
> > > > trying
> > > > >> > > > with &debug=true appended.
> > > > >> > > >
> > > > >> > > > The adminUI/analysis page is very helpful in these
> situations
> > as
> > > > >> well.
> > > > >> > > Select
> > > > >> > > > the appropriate core from the drop-down on the
left and
> you'll
> > > see
> > > > >> > > > an "analysis"
> > > > >> > > > section appear that shows you exactly what happens
when the
> > > field
> > > > is
> > > > >> > > > analyzed.
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Erick
> > > > >> > > >
> > > > >> > > > On Mon, Sep 28, 2015 at 5:01 AM, Bhaskar <
> > bhaskar1484@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > > > Thanks Lan for reply.
> > > > >> > > > >
> > > > >> > > > > cpn values are like 123-0049, 342-043, ab23-090,
hedwsdg
> > > > >> > > > >
> > > > >> > > > > my application is working when i gave search
 for below
> > inputs
> > > > >> > > > > 1) ab*
> > > > >> > > > >  2)hedwsdg
> > > > >> > > > > 3) hed*
> > > > >> > > > >
> > > > >> > > > > but it is not working for
> > > > >> > > > > 1) 123*
> > > > >> > > > > 2) 123-0049
> > > > >> > > > > 3) ab23*
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > Note: if the search input has number then
it is not
> working.
> > > > >> > > > >
> > > > >> > > > > Thanks in advacne.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, Sep 28, 2015 at 3:49 PM, Ian Lea
<
> ian.lea@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > > >
> > > > >> > > > >> Hi
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >> Can you provide a few examples of values
of cpn that a)
> are
> > > and
> > > > >> > > > >> b) are not being found, for indexing
and searching.
> > > > >> > > > >>
> > > > >> > > > >> You may also find some of the tips at
> > > > >> > > > >>
> > > > >> > > > >> http://wiki.apache.org/lucene-
> > > > >> > > > java/LuceneFAQ#Why_am_I_getting_no_hits
> > > > >> > > > >> _.2F_incorrect_hits.3F
> > > > >> > > > >> useful.
> > > > >> > > > >>
> > > > >> > > > >> You haven't shown the code that created
the IndexWriter
> so
> > > the
> > > > >> > > > >> tip about using the same analyzer at
index and search
> time
> > > > might
> > > > >> > > > >> be relevant.
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >> --
> > > > >> > > > >> Ian.
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >> On Mon, Sep 28, 2015 at 10:49 AM, Bhaskar
> > > > >> > <bhaskar1484@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > > >> > Hi,
> > > > >> > > > >> > I am beginner in Apache lucene,
I am using 5.3.1.
> > > > >> > > > >> > I have created  the index on the
database result. The
> > index
> > > > >> > > > >> > values are having alphanumeric and
strings values. I am
> > > able
> > > > to
> > > > >> > > > >> > search the strings
> > > > >> > > > >> but
> > > > >> > > > >> > I am not able to search alphanumeric
values.
> > > > >> > > > >> >
> > > > >> > > > >> > Can someone help me here.
> > > > >> > > > >> >
> > > > >> > > > >> > Below is indexing code...
> > > > >> > > > >> >
> > > > >> > > > >> > int indexDocs(IndexWriter writer,
Connection conn)
> throws
> > > > >> > > > >> > Exception { Statement stmt = conn.createStatement();
> > > > >> > > > >> >   ResultSet rs = stmt.executeQuery(sql);
> > > > >> > > > >> >   int i=0;
> > > > >> > > > >> >   while (rs.next()) {
> > > > >> > > > >> >      Document d = new Document();
> > > > >> > > > >> >     // System.out.println("cpn is"
+
> > rs.getString("cpn"));
> > > > >> > > > >> >     // System.out.println("mpn is"
+
> > rs.getString("mpn"));
> > > > >> > > > >> >
> > > > >> > > > >> >   d.add(new TextField("cpn", rs.getString("cpn"),
> > > > >> > > > >> > Field.Store.YES));
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> >      writer.addDocument(d);
> > > > >> > > > >> >      i++;
> > > > >> > > > >> >  }
> > > > >> > > > >> > }
> > > > >> > > > >> >
> > > > >> > > > >> > Searching code:
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > private void searchIndex(Path indexDir,
String
> queryStr)
> > > > throws
> > > > >> > > > >> Exception {
> > > > >> > > > >> > Directory directory = FSDirectory.open(indexDir);
> > > > >> > > > >> > System.out.println("The query string
is " + queryStr);
> //
> > > > >> > > > >> > MultiFieldQueryParser queryParser
= new
> > > > >> > > > >> > MultiFieldQueryParser(new // String[]
{"mpn"}, new
> > > > >> > > > >> > StandardAnalyzer()); // IndexReader
reader =
> > > > >> > > > >> > IndexReader.open(directory); IndexReader
reader =
> > > > >> > > > >> > DirectoryReader.open(directory);
IndexSearcher
> searcher =
> > > new
> > > > >> > > > >> > IndexSearcher(reader); Analyzer
analyzer = new
> > > > >> > > > >> > StandardAnalyzer(); analyzer.tokenStream("cpn",
> > queryStr);
> > > > >> > > > >> > QueryParser parser = new QueryParser("cpn",
analyzer);
> > > > >> > > > >> > parser.setDefaultOperator(Operator.OR);
> > > > >> > > > >> > parser.getAllowLeadingWildcard();
> > > > >> > > > >> > parser.setAutoGeneratePhraseQueries(true);
> > > > >> > > > >> > Query query = parser.parse(queryStr);
> > > searcher.search(query,
> > > > >> > > > >> > 100); TopDocs topDocs = searcher.search(query,
> MAX_HITS);
> > > > >> > > > >> >
> > > > >> > > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
> > > > >> > > > >> > System.out.println(hits.length
> > > > >> > > > >> > + " Record(s) Found"); for (int
i = 0; i < hits.length;
> > > i++)
> > > > {
> > > > >> > > > >> > + int
> > > > >> > > > >> > docId = hits[i].doc; Document d
= searcher.doc(docId);
> > > > >> > > > >> > System.out.println("\"value is:\"
" + d.get("cpn")); }
> if
> > > > >> > > > >> > (hits.length == 0) { System.out.println("No
Data Founds
> > > "); }
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > Thanks in advance.
> > > > >> > > > >> >
> > > > >> > > > >> > --
> > > > >> > > > >> > Keep Smiling....
> > > > >> > > > >> > Thanks & Regards
> > > > >> > > > >> > Bhaskar.
> > > > >> > > > >> > Mobile:9866724142
> > > > >> > > > >>
> > > > >> > > > >>
> > > > -----------------------------------------------------------------
> > > > >> > > > >> ---- To unsubscribe, e-mail:
> > > > >> > > > >> java-user-unsubscribe@lucene.apache.org
> > > > >> > > > >> For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Keep Smiling....
> > > > >> > > > > Thanks & Regards
> > > > >> > > > > Bhaskar.
> > > > >> > > > > Mobile:9866724142
> > > > >> > > >
> > > > >> > > >
> > > > --------------------------------------------------------------------
> > > > >> > > > - To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > >> > > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > ---------------------------------------------------------------------
> > > > >> > > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > > > >> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Keep Smiling....
> > > > >> > Thanks & Regards
> > > > >> > Bhaskar.
> > > > >> > Mobile:9866724142
> > > > >>
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Keep Smiling....
> > > > > Thanks & Regards
> > > > > Bhaskar.
> > > > > Mobile:9866724142
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message