lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Bell <arach...@gmail.com>
Subject Re: Beginner's questions
Date Wed, 27 Mar 2013 20:04:59 GMT
Thanks Adrien.

I've scraped together a simple program in the Lucene 4.2 idiom (see below).
Does this illustrate what you meant by your last sentence?

The code adds/indexes 5 documents all of whose content is identical, but
whose 'id' field is unique ("v1" through "v5"). It then queries the 'id'
field for the pattern "v*".

While we're at it, what method should I be using to obtain merely the
original document itself after a query? My println of "Document=" +
doc.toString() shows this:


Document=Document<stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id:v2>
stored,indexed,tokenized<content:This is a test; for the next 60
seconds...>>

But if I'm interested in just obtaining the "content" field ("This is a
test; for the next..."), what should I do?

-Paul


public class TestLucene
{
    public static int id = 0;

    public static void main(String[] args)
    {
        RAMDirectory idx = new RAMDirectory();

        try
        {
            IndexWriterConfig conf = new
IndexWriterConfig(Version.LUCENE_42, new
StandardAnalyzer(Version.LUCENE_42));

            IndexWriter writer = new IndexWriter(idx, conf);

            for ( int n=0; n < 5; n++ )
            {
                writer.addDocument(createDocument("This is a test; for the
next 60 seconds..."));
            }

            writer.close();

            IndexReader reader = DirectoryReader.open(idx);
            IndexSearcher searcher = new IndexSearcher(reader);
            Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_42);

            QueryParser parser = new QueryParser(Version.LUCENE_42, "id",
analyzer);        // searching on the id field
            Query query =
parser.parse("v*");
               // look for "v*" within the id field
            TopDocs hits = searcher.search(query, null, 100);

            for ( ScoreDoc scoreDoc : hits.scoreDocs )
            {
                 Document doc = searcher.doc(scoreDoc.doc);
                 String idField =  doc.get("id");
                 System.out.println("ID=" + idField);
                 System.out.println("Document=" + doc.toString());
            }

            System.out.println("Total hits=" + hits.totalHits);
        }
        catch(Exception e)
        {
            System.out.println("CAUGHT EXCEPTION=" +
e.getLocalizedMessage());
        }
    }


    private static Document createDocument(String content)
    {
        Document doc = new Document();

        id++;

        StringField strField = new StringField("id", "v" + id,
Field.Store.YES);            // give the document a unique identifier
        doc.add(strField);

        TextField txtField = new TextField("content", content,
Field.Store.YES);
        doc.add(txtField);

        return doc;
    }
}



On Wed, Mar 27, 2013 at 11:37 AM, Adrien Grand <jpountz@gmail.com> wrote:

> Hi Paul,
>
> On Wed, Mar 27, 2013 at 1:58 PM, Paul Bell <arachweb@gmail.com> wrote:
> > As to the ideas raised in the links you pointed me to: the first link
> shows
> > the instantiation of a Term object via
> >
> >    writer.UpdateDocument(new Term("IDField", *id*), doc);
> >
> > yet in the 4.2.0 docs I see no Term constructor that allows this "id"
> > field.
>
> I think this is this one:
>
> http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/Term.html#Term(java.lang.String
> ,
> java.lang.String)
>
> > But this raises an interesting question: is it possible to tell Lucene
> that
> > the Document I've given it to index has a specific identifier? Here's an
> > example of what I mean. Suppose that the DB in question is a NoSQL type
> of
> > the graph flavor. I add a vertex to that graph. The vertex contains some
> > properties, e.g., name and type, whose values are text strings. I want
> > Lucene to index these data AND I want to know some kind of identifier for
> > that vertex Document. I would prefer to give Lucene that ID, though I
> might
> > be able to tolerate it giving it to me.
>
> Lucene has no schema that would allow you to specify a primary key,
> but there is the IndexWriter.updateDocument method that allows for
> atomic updates of documents:
>
> http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html#updateDocument(org.apache.lucene.index.Term
> ,
> java.lang.Iterable)
>
> You just need to pass a term where the field name is the name of your
> primary key field and the value is the actual ID.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message