lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@syr.edu>
Subject Re: Lucene and SAX
Date Tue, 25 Oct 2005 19:26:39 GMT
 From what I can see, you are only passing volume.xml to your parser.  
If I understand your code and questions correctly, the Volume file 
simply points to the actual articles that you want to parse.  Seems like 
you need to parse the Volume file, get the name/location of the article 
file and then parse that for entry in Lucene.  Or am I mistaken?  What 
does the volume file look like?

Malcolm wrote:

>It's XML like this. It has 120-ish volumes with references to 12,107 articles which are
like this below:
><article>
><fno>A1003</fno>
><doi>10.1041/A1003s-1995</doi>
><fm><hdr><hdr1><ti>IEEE Annals of the History of Computing</ti>
><crt><issn>1058-6180</issn>/95/$4.00 <cci><onm>&copy;
1995 IEEE</onm></cci></crt></hdr1>
><hdr2><obi><volno>Vol. 17</volno>, <issno>No. 1</issno></obi>
><pdt><mo>Spring</mo><yr>1995</yr></pdt>
><pp>pp. 3-3</pp></hdr2></hdr>
><tig>
><atl>About this Issue</atl><pn>pp. 3-3</pn></tig>
><au sequence="first"><fnm>J.A.N.</fnm><snm>Lee</snm><role>Editor&hyphen;in&hyphen;Chief</role></au>
></fm>
><bdy>
><p>The first issue of our 17th volume is as diverse in topics as any nontheme issue
that we have tried to present over the past many years. However, it still represents the work
of the English&hyphen;speaking world of the North Atlantic rather than a broader picture
of computing in the whole world. The Editorial Board and the article editors of the <it>Annals</it>
>are doing their best to bring the history of the whole world of computing to our readers,
but it does require authors in other countries to offer their manuscripts for our consideration.
Please take this as an open invitation to authors in other parts of the world to submit papers
to the <it>Annals</it>
>for review and help us to follow the lead of our parent organization in being the &ldquo;The
World&rsquo;s Computer Society.&rdquo;</p>
><p>The five major articles in this issue represent several manuscripts that have
been in our files for some time, and we are grateful to the authors for having &ldquo;stuck
with us&rdquo; while we reviewed, re&hyphen;reviewed, and reworked their papers. Articles
in the field of history do not always present the work of the authors themselves (though we
welcome pioneers to give us their own stories, as in the case of the 1935 article by John
McPherson in this issue); thus, answering the question &ldquo;is it accurate?&rdquo;
is not always easy. In fact, we ask our referees to answer the following questions about each
manuscript, and their responses determine whether we accept the manuscript &ldquo;as is&rdquo;
or whether we ask the author(s) to revise the material:</p>
><l2>
><li>
><p>Are the issues addressed in the paper stated clearly enough?</p></li>
><li>
></bdy></article>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message