lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: indexing xml messages
Date Tue, 03 Nov 2009 09:08:33 GMT
StandardAnalyzer will, amongst other things, convert everything to
lowercase which means that term queries on mixed or upper case text
will fail to match.

There is some info on indexing XML docs in the FAQ
http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_XML_documents.3F
and I'm sure that Google would find loads more stuff.

And Luke is invaluable for seeing what your index really holds.


--
Ian.


On Tue, Nov 3, 2009 at 7:40 AM, vsevel <v.sevel@lombardodier.com> wrote:
>
> Hi, the following junit test fails on 3 out of the 6 searches:
>
>    @Test
>    public void indexXML() throws Exception {
>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>        RAMDirectory dir = new RAMDirectory();
>        IndexWriter writer = new IndexWriter(dir, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
>        Document doc = new Document();
>        String xml = FileHelper.readFileContent("lucene_work/myxml.xml");
>        doc.add(new Field("myxml", xml, Field.Store.YES,
> Field.Index.ANALYZED));
>        doc.add(new Field("id", "1", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>        writer.addDocument(doc);
>        writer.close();
>
>        IndexReader reader = IndexReader.open(dir, true); // only searching,
> so read-only=true
>        Searcher searcher = new IndexSearcher(reader);
>        // Assert.assertEquals(1, searcher.search(new TermQuery(new
> Term("myxml", "123AB")), 1).totalHits);
>        Assert.assertEquals(1, searcher.search(new TermQuery(new
> Term("myxml", "reference")), 1).totalHits);
>        // Assert.assertEquals(1, searcher.search(new TermQuery(new
> Term("myxml", "operationImpact")), 1).totalHits);
>        Assert.assertEquals(1, searcher.search(new TermQuery(new
> Term("myxml", "data")), 1).totalHits);
>        // Assert.assertEquals(1, searcher.search(new TermQuery(new
> Term("myxml", "EFG")), 1).totalHits);
>        searcher.close();
>        reader.close();
>    }
>
> given this xml message:
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <operationImpact>
>        <reference value="123AB"/>
>        <data>EFG</data>
> </operationImpact>
>
> How do I get this to work? My goal is to be able to do full text search on
> XML documents. This includes tags, attribute values and tag values.
>
> Thanks,
> vince
> --
> View this message in context: http://old.nabble.com/indexing-xml-messages-tp26160016p26160016.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message