lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar <bhaskar1...@gmail.com>
Subject Re: Need help in alphanumeric search
Date Fri, 02 Oct 2015 02:04:45 GMT
Hi Jack,

my searching is working like this.

if i give input as "SD RAM Bhaskar" then which ever strings are having
"SD", "RAM", "Bhaskar" all results are coming .

i.e. "SD lib"

      "RAM hello"

      "hi Bhaskar "

      "Bhaskar hai SD"

But I want below output.

       "SD RAM Bhaskar"

       "SD RAM Bhaskar hello"

i.e in the begining string have "SD RAM Bhaskar"  then next string can be
any thing.

but my current application result where ever it finds the "SD", or "RAM",
or "Bhaskar" I am getting all the string.

Regards,
Bhaskar
On Oct 2, 2015 1:19 AM, "Jack Krupansky" <jack.krupansky@gmail.com> wrote:

> Technically, there is no such thing as a "sentence search" in Lucene.
> Please provide an example of how you wish to search, and then we can
> determine whether a phrase query or a span query might accomplish the task.
>
> -- Jack Krupansky
>
> On Thu, Oct 1, 2015 at 11:53 AM, Bhaskar <bhaskar1484@gmail.com> wrote:
>
> > Hi,
> > I am looking for sentence search rather than word search.
> > Regards,
> > Bhaskar
> > On Oct 1, 2015 7:07 PM, "Ian Lea" <ian.lea@gmail.com> wrote:
> >
> > > Take a look at
> > >
> >
> http://lucene.apache.org/core/5_3_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description
> > > .
> > > Sounds like you want an AND, or a +, or both. You may also want to
> > > take a look at phrase queries and/or span queries.
> > >
> > >
> > > --
> > > Ian.
> > >
> > >
> > >
> > > --
> > > Ian.
> > >
> > >
> > > On Thu, Oct 1, 2015 at 1:52 PM, Bhaskar <bhaskar1484@gmail.com> wrote:
> > > > Hi Uwe,
> > > > my searching is working like this.
> > > > if i give input as "SD RAM Bhaskar" then which ever strings are
> having
> > > > "SD", "RAM", "Bhaskar" all results are coming .
> > > > i.e. "SD lib"
> > > >       "RAM hello"
> > > >       "hi Bhaskar "
> > > >       "Bhaskar hai SD"
> > > >
> > > >
> > > > But I want below output.
> > > >        "SD RAM Bhaskar"
> > > >        "SD RAM Bhaskar hello"
> > > > i.e in the begining string have "SD RAM Bhaskar"  then next string
> can
> > be
> > > > any thing.
> > > >
> > > >
> > > > but my current application result where ever it finds the "SD", or
> > "RAM",
> > > > or "Bhaskar" I am getting all the string.
> > > >
> > > >
> > > > Can you please advice?
> > > > Thanks a lot in advance.
> > > >
> > > > Regards,
> > > > Bhaskar
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Sep 30, 2015 at 12:23 PM, Uwe Schindler <uwe@thetaphi.de>
> > wrote:
> > > >
> > > >> Hi Bhaskar,
> > > >>
> > > >> the answer is very simple: Your analysis is not useful for the type
> of
> > > >> queries and data you are using. You are using SimpleAnalyzer in your
> > > >> search/indexing code:
> > > >>
> > > >>
> > > >>
> > >
> >
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
> > > >> "An Analyzer that filters LetterTokenizer with LowerCaseFilter"
> > > >>
> > > >> And LetterTokenizer does the following:
> > > >>
> > > >>
> > >
> >
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
> > > >> "A LetterTokenizer is a tokenizer that divides text at non-letters.
> > > That's
> > > >> to say, it defines tokens as maximal strings of adjacent letters,
as
> > > >> defined by java.lang.Character.isLetter() predicate."
> > > >>
> > > >> So it creates a new token at every non-letter boundary. All
> > non-letters
> > > >> are discarded (because they are treated as token boundary). So your
> > > queries
> > > >> can never match.
> > > >>
> > > >> I'd suggest to first inform yourself about analysis and choose a
> > better
> > > >> one that suits your underlying data and the queries you want to do.
> > > Maybe
> > > >> use WhitespaceAnalyzer or better StandardAnalyzer as a first step.
> Be
> > > sure
> > > >> to reindex your data before querying. The Analyzer used on the
> search
> > > side
> > > >> must be the same like on the query side. If you want to use
> wildcards,
> > > you
> > > >> have to take care more, because wildcards are not really natural for
> > > "full
> > > >> text search engine" and may cause inconsistent results.
> > > >>
> > > >> Uwe
> > > >>
> > > >> -----
> > > >> Uwe Schindler
> > > >> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >> http://www.thetaphi.de
> > > >> eMail: uwe@thetaphi.de
> > > >>
> > > >> > -----Original Message-----
> > > >> > From: Bhaskar [mailto:bhaskar1484@gmail.com]
> > > >> > Sent: Wednesday, September 30, 2015 4:28 AM
> > > >> > To: java-user@lucene.apache.org
> > > >> > Subject: Re: Need help in alphanumeric search
> > > >> >
> > > >> > Hi Uwe,
> > > >> >
> > > >> > Below is my indexing code:
> > > >> >
> > > >> > public static void main(String[] args) throws Exception { //Path
> > > >> indexDir =
> > > >> > new Path(INDEX_DIR); public static final String INDEX_DIR =
> > > >> "c:/DBIndexAll/";
> > > >> > final Path indexDir = Paths.get(INDEX_DIR); SimpleDBIndexer
> indexer
> > =
> > > new
> > > >> > SimpleDBIndexer(); try{
> > > >> >    Class.forName(JDBC_DRIVER).newInstance();
> > > >> >    Connection conn = DriverManager.getConnection(CONNECTION_URL
+
> > > >> > DBNAME, USER_NAME, PASSWORD);
> > > >> >    SimpleAnalyzer analyzer = new SimpleAnalyzer();
> > > >> >    IndexWriterConfig indexWriterConfig = new
> > > IndexWriterConfig(analyzer);
> > > >> >    IndexWriter indexWriter = new
> > > IndexWriter(FSDirectory.open(indexDir),
> > > >> > indexWriterConfig);
> > > >> >    System.out.println("Indexing to directory '" + indexDir +
> > "'...");
> > > >> >    int indexedDocumentCount = indexer.indexDocs(indexWriter,
> conn);
> > > >> >    indexWriter.close();
> > > >> >    System.out.println(indexedDocumentCount + " records have been
> > > indexed
> > > >> > successfully"); } catch (Exception e) {
> > > >> >    e.printStackTrace();
> > > >> > }
> > > >> > }
> > > >> >
> > > >> > int indexDocs(IndexWriter writer, Connection conn) throws
> Exception
> > {
> > > >> >   String sql = QUERY1;
> > > >> >   Statement stmt = conn.createStatement();
> > > >> >   ResultSet rs = stmt.executeQuery(sql);
> > > >> >   int i=0;
> > > >> >   while (rs.next()) {
> > > >> >      Document d = new Document();
> > > >> >      d.add(new TextField("cpn", rs.getString("cpn"),
> > > Field.Store.YES));
> > > >> >
> > > >> >      writer.addDocument(d);
> > > >> >      i++;
> > > >> >  }
> > > >> >   stmt.close();
> > > >> >   rs.close();
> > > >> >
> > > >> >   return i;
> > > >> > }
> > > >> >
> > > >> >
> > > >> > Searching code:
> > > >> >
> > > >> > public class SimpleDBSearcher {
> > > >> > // PLASTRON
> > > >> > private static final String LUCENE_QUERY = "SD*"; private static
> > final
> > > >> int
> > > >> > MAX_HITS = 500; private static final String INDEX_DIR =
> > > "C:/DBIndexAll/";
> > > >> >
> > > >> > public static void main(String[] args) throws Exception { //
File
> > > >> indexDir = new
> > > >> > File(SimpleDBIndexer.INDEX_DIR); final Path indexDir =
> > > >> > Paths.get(SimpleDBIndexer.INDEX_DIR);
> > > >> > String query = LUCENE_QUERY;
> > > >> > SimpleDBSearcher searcher = new SimpleDBSearcher();
> > > >> > searcher.searchIndex(indexDir, query); }
> > > >> >
> > > >> > private void searchIndex(Path indexDir, String queryStr) throws
> > > >> Exception {
> > > >> > Directory directory = FSDirectory.open(indexDir);
> > > System.out.println("The
> > > >> > query string is " + queryStr); MultiFieldQueryParser queryParser
=
> > new
> > > >> > MultiFieldQueryParser(new String[] { "cpn" }, new
> > StandardAnalyzer());
> > > >> > IndexReader reader = DirectoryReader.open(directory);
> IndexSearcher
> > > >> > searcher = new IndexSearcher(reader);
> > > >> > queryParser.getAllowLeadingWildcard();
> > > >> >
> > > >> > Query query = queryParser.parse(queryStr); TopDocs topDocs =
> > > >> > searcher.search(query, MAX_HITS);
> > > >> >
> > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
> > > >> > System.out.println(hits.length + " Record(s) Found"); for (int
i =
> > 0;
> > > i <
> > > >> > hits.length; i++) { int docId = hits[i].doc; Document d =
> > > >> searcher.doc(docId);
> > > >> > System.out.println("\"cpn value is:\" " + d.get("cpn")); } if
> > > >> (hits.length == 0) {
> > > >> > System.out.println("No Data Founds "); }
> > > >> >
> > > >> > }
> > > >> > }
> > > >> >
> > > >> >
> > > >> > Please help here, thanks in advance.
> > > >> >
> > > >> > Regards,
> > > >> > Bhaskar
> > > >> >
> > > >> > On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler <uwe@thetaphi.de>
> > > wrote:
> > > >> >
> > > >> > > Hi Erick,
> > > >> > >
> > > >> > > This mail was in Lucene's user mailing list. This is not
about
> > Solr,
> > > >> > > so user cannot provide his Solr config! :-) In any case,
it
> would
> > be
> > > >> > > good to get the Analyzer + code you use while indexing and
also
> > the
> > > >> > > code (+ Analyzer) that creates the query while searching.
> > > >> > >
> > > >> > > Uwe
> > > >> > >
> > > >> > > -----
> > > >> > > Uwe Schindler
> > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >> > > http://www.thetaphi.de
> > > >> > > eMail: uwe@thetaphi.de
> > > >> > >
> > > >> > >
> > > >> > > > -----Original Message-----
> > > >> > > > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > > >> > > > Sent: Monday, September 28, 2015 6:01 PM
> > > >> > > > To: java-user
> > > >> > > > Subject: Re: Need help in alphanumeric search
> > > >> > > >
> > > >> > > > You need to supply the definitions of this field from
your
> > > >> > > > schema.xml
> > > >> > > file,
> > > >> > > > both the <field> and <fieldType>
> > > >> > > >
> > > >> > > > Additionally, please provide the results of the query
you're
> > > trying
> > > >> > > > with &debug=true appended.
> > > >> > > >
> > > >> > > > The adminUI/analysis page is very helpful in these
situations
> as
> > > >> well.
> > > >> > > Select
> > > >> > > > the appropriate core from the drop-down on the left
and you'll
> > see
> > > >> > > > an "analysis"
> > > >> > > > section appear that shows you exactly what happens
when the
> > field
> > > is
> > > >> > > > analyzed.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Erick
> > > >> > > >
> > > >> > > > On Mon, Sep 28, 2015 at 5:01 AM, Bhaskar <
> bhaskar1484@gmail.com
> > >
> > > >> > wrote:
> > > >> > > > > Thanks Lan for reply.
> > > >> > > > >
> > > >> > > > > cpn values are like 123-0049, 342-043, ab23-090,
hedwsdg
> > > >> > > > >
> > > >> > > > > my application is working when i gave search 
for below
> inputs
> > > >> > > > > 1) ab*
> > > >> > > > >  2)hedwsdg
> > > >> > > > > 3) hed*
> > > >> > > > >
> > > >> > > > > but it is not working for
> > > >> > > > > 1) 123*
> > > >> > > > > 2) 123-0049
> > > >> > > > > 3) ab23*
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > Note: if the search input has number then it is
not working.
> > > >> > > > >
> > > >> > > > > Thanks in advacne.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Sep 28, 2015 at 3:49 PM, Ian Lea <ian.lea@gmail.com
> >
> > > >> wrote:
> > > >> > > > >
> > > >> > > > >> Hi
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >> Can you provide a few examples of values of
cpn that a) are
> > and
> > > >> > > > >> b) are not being found, for indexing and searching.
> > > >> > > > >>
> > > >> > > > >> You may also find some of the tips at
> > > >> > > > >>
> > > >> > > > >> http://wiki.apache.org/lucene-
> > > >> > > > java/LuceneFAQ#Why_am_I_getting_no_hits
> > > >> > > > >> _.2F_incorrect_hits.3F
> > > >> > > > >> useful.
> > > >> > > > >>
> > > >> > > > >> You haven't shown the code that created the
IndexWriter so
> > the
> > > >> > > > >> tip about using the same analyzer at index
and search time
> > > might
> > > >> > > > >> be relevant.
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >> --
> > > >> > > > >> Ian.
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >> On Mon, Sep 28, 2015 at 10:49 AM, Bhaskar
> > > >> > <bhaskar1484@gmail.com>
> > > >> > > > wrote:
> > > >> > > > >> > Hi,
> > > >> > > > >> > I am beginner in Apache lucene, I am
using 5.3.1.
> > > >> > > > >> > I have created  the index on the database
result. The
> index
> > > >> > > > >> > values are having alphanumeric and strings
values. I am
> > able
> > > to
> > > >> > > > >> > search the strings
> > > >> > > > >> but
> > > >> > > > >> > I am not able to search alphanumeric
values.
> > > >> > > > >> >
> > > >> > > > >> > Can someone help me here.
> > > >> > > > >> >
> > > >> > > > >> > Below is indexing code...
> > > >> > > > >> >
> > > >> > > > >> > int indexDocs(IndexWriter writer, Connection
conn) throws
> > > >> > > > >> > Exception { Statement stmt = conn.createStatement();
> > > >> > > > >> >   ResultSet rs = stmt.executeQuery(sql);
> > > >> > > > >> >   int i=0;
> > > >> > > > >> >   while (rs.next()) {
> > > >> > > > >> >      Document d = new Document();
> > > >> > > > >> >     // System.out.println("cpn is" +
> rs.getString("cpn"));
> > > >> > > > >> >     // System.out.println("mpn is" +
> rs.getString("mpn"));
> > > >> > > > >> >
> > > >> > > > >> >   d.add(new TextField("cpn", rs.getString("cpn"),
> > > >> > > > >> > Field.Store.YES));
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> >      writer.addDocument(d);
> > > >> > > > >> >      i++;
> > > >> > > > >> >  }
> > > >> > > > >> > }
> > > >> > > > >> >
> > > >> > > > >> > Searching code:
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > private void searchIndex(Path indexDir,
String queryStr)
> > > throws
> > > >> > > > >> Exception {
> > > >> > > > >> > Directory directory = FSDirectory.open(indexDir);
> > > >> > > > >> > System.out.println("The query string
is " + queryStr); //
> > > >> > > > >> > MultiFieldQueryParser queryParser = new
> > > >> > > > >> > MultiFieldQueryParser(new // String[]
{"mpn"}, new
> > > >> > > > >> > StandardAnalyzer()); // IndexReader reader
=
> > > >> > > > >> > IndexReader.open(directory); IndexReader
reader =
> > > >> > > > >> > DirectoryReader.open(directory); IndexSearcher
searcher =
> > new
> > > >> > > > >> > IndexSearcher(reader); Analyzer analyzer
= new
> > > >> > > > >> > StandardAnalyzer(); analyzer.tokenStream("cpn",
> queryStr);
> > > >> > > > >> > QueryParser parser = new QueryParser("cpn",
analyzer);
> > > >> > > > >> > parser.setDefaultOperator(Operator.OR);
> > > >> > > > >> > parser.getAllowLeadingWildcard();
> > > >> > > > >> > parser.setAutoGeneratePhraseQueries(true);
> > > >> > > > >> > Query query = parser.parse(queryStr);
> > searcher.search(query,
> > > >> > > > >> > 100); TopDocs topDocs = searcher.search(query,
MAX_HITS);
> > > >> > > > >> >
> > > >> > > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
> > > >> > > > >> > System.out.println(hits.length
> > > >> > > > >> > + " Record(s) Found"); for (int i = 0;
i < hits.length;
> > i++)
> > > {
> > > >> > > > >> > + int
> > > >> > > > >> > docId = hits[i].doc; Document d = searcher.doc(docId);
> > > >> > > > >> > System.out.println("\"value is:\" " +
d.get("cpn")); } if
> > > >> > > > >> > (hits.length == 0) { System.out.println("No
Data Founds
> > "); }
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > Thanks in advance.
> > > >> > > > >> >
> > > >> > > > >> > --
> > > >> > > > >> > Keep Smiling....
> > > >> > > > >> > Thanks & Regards
> > > >> > > > >> > Bhaskar.
> > > >> > > > >> > Mobile:9866724142
> > > >> > > > >>
> > > >> > > > >>
> > > -----------------------------------------------------------------
> > > >> > > > >> ---- To unsubscribe, e-mail:
> > > >> > > > >> java-user-unsubscribe@lucene.apache.org
> > > >> > > > >> For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Keep Smiling....
> > > >> > > > > Thanks & Regards
> > > >> > > > > Bhaskar.
> > > >> > > > > Mobile:9866724142
> > > >> > > >
> > > >> > > >
> > > --------------------------------------------------------------------
> > > >> > > > - To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > >> > > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > > >> > >
> > > >> > >
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> > > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Keep Smiling....
> > > >> > Thanks & Regards
> > > >> > Bhaskar.
> > > >> > Mobile:9866724142
> > > >>
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Keep Smiling....
> > > > Thanks & Regards
> > > > Bhaskar.
> > > > Mobile:9866724142
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message