lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vsevel <v.se...@lombardodier.com>
Subject indexing xml messages
Date Tue, 03 Nov 2009 07:40:44 GMT

Hi, the following junit test fails on 3 out of the 6 searches:

    @Test
    public void indexXML() throws Exception {
        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
        RAMDirectory dir = new RAMDirectory(); 
        IndexWriter writer = new IndexWriter(dir, analyzer, true,
IndexWriter.MaxFieldLength.LIMITED);
        Document doc = new Document();
        String xml = FileHelper.readFileContent("lucene_work/myxml.xml");
        doc.add(new Field("myxml", xml, Field.Store.YES,
Field.Index.ANALYZED));
        doc.add(new Field("id", "1", Field.Store.YES,
Field.Index.NOT_ANALYZED));
        writer.addDocument(doc);
        writer.close();
        
        IndexReader reader = IndexReader.open(dir, true); // only searching,
so read-only=true
        Searcher searcher = new IndexSearcher(reader);
        // Assert.assertEquals(1, searcher.search(new TermQuery(new
Term("myxml", "123AB")), 1).totalHits);
        Assert.assertEquals(1, searcher.search(new TermQuery(new
Term("myxml", "reference")), 1).totalHits);
        // Assert.assertEquals(1, searcher.search(new TermQuery(new
Term("myxml", "operationImpact")), 1).totalHits);
        Assert.assertEquals(1, searcher.search(new TermQuery(new
Term("myxml", "data")), 1).totalHits);
        // Assert.assertEquals(1, searcher.search(new TermQuery(new
Term("myxml", "EFG")), 1).totalHits);
        searcher.close();
        reader.close();
    }

given this xml message:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<operationImpact>
        <reference value="123AB"/>
        <data>EFG</data>
</operationImpact>

How do I get this to work? My goal is to be able to do full text search on
XML documents. This includes tags, attribute values and tag values.

Thanks,
vince
-- 
View this message in context: http://old.nabble.com/indexing-xml-messages-tp26160016p26160016.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message