This tutorial shows how to retrieve an XML document from a URL and parse it using Java's SaxParser. The XML in question comes from a zekkocho exclusive API that returns a random set of Japanese words along with their possible Engilsh definitions.
http://www.zekkocho.com/api/japanese/wordset.php?n=5
− ... 効力 こうりょく effect efficacy validity potency
Before we build the parser, we will need a simple class that represents one word, or entry. This class will include getters and setters for each child element of the XML word we intend to save.
Word.java
package com.matt.test;
import java.util.*;
public class Word{
private String kanji;
private String hiragana;
private List definitions;
public Word(){
this.definitions = new ArrayList();
}
public void setKanji(String kanji){
this.kanji = kanji;
}
public String getKanji(){
return this.kanji;
}
public void setHiragana(String hiragana){
this.hiragana = hiragana;
}
public String getHiragana(){
return this.hiragana;
}
public void addDefinition(String definition) {
//Add to list of possible definitions
this.definitions.add(definition);
}
public List getDefinitions() {
return this.definitions;
}
}
Notice that definitions is an ArrayList. Since there can be more than one definition per word, we will record the multiple definitions in this list.
Next we create the FeedParse class. This class will be responsible for instantiating the SaxParser, connecting to the URL that will provide the XML, and passing the SaxParser an XML handler, which we will create last.
FeedParser.java
package com.matt.test;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class FeedParser {
//Constant definitions for our XML element names
static final String KANJI = "kanji";
static final String HIRAGANA = "hiragana";
static final String DEFINITION = "definition";
//comprises the above in XML
static final String WORD = "word";
//Create URL object in constructor
final URL feedUrl;
protected FeedParser(String feedUrl){
try {
this.feedUrl = new URL(feedUrl);
} catch (MalformedURLException e) {
throw new RuntimeException(e);
}
}
//Pars the XML from feedURL and return as collection of Word objects
public List parse() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
//Instantiate our RSS handler and pass it into SaxParser
XMLHandler handler = new XMLHandler();
parser.parse(this.getInputStream(), handler);
return handler.getWords();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
//Connects to feedUrl
protected InputStream getInputStream() {
try {
return feedUrl.openConnection().getInputStream();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
The SaxParser object is created in the parse() method. It takes two parameters. The first is an InputStream which feeds us the XML. The second is a DefaultHandler, or in this case, a custom class called XMLHandler that extends DefaultHandler.
XMLHanlder.java
package com.matt.test;
import static com.matt.test.FeedParser.*;
import java.util.ArrayList;
import java.util.List;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
//Custom XML Handler. This will be passed to the SAXParser's parse method
public class XMLHandler extends DefaultHandler{
private List words; //collects Word objects
private Word currentWord; //current word being parsed from XML
private StringBuilder builder; //holds XML elements text
public List getWords(){
return this.words;
}
//Override DefaultHandlers characters method so we can add
//the chars to our own StringBuilder
public void characters(char[] ch, int start, int length)
throws SAXException {
super.characters(ch, start, length);
builder.append(ch, start, length);
}
//Override DefaultHanlders endElement callback so we can
//record the XML elements data
public void endElement(String uri, String localName, String name)
throws SAXException {
super.endElement(uri, localName, name);
//If the current Element is 'Word', Record value for element of 'Word'
//Our StringBuilder currently holds the data for the current element
if (this.currentWord != null){
//use name instead of localName, that is for name spaces
if (name.equalsIgnoreCase(KANJI)){
currentWord.setKanji(builder.toString());
} else if (name.equalsIgnoreCase(HIRAGANA)){
currentWord.setHiragana(builder.toString());
} else if (name.equalsIgnoreCase(DEFINITION)){
currentWord.addDefinition(builder.toString());
} else if (name.equalsIgnoreCase(WORD)){
words.add(currentWord);
}
builder.setLength(0);
}
}
//Override DefaultHanlders startDocument so we can
//instantiate our own objects prior to parsing
public void startDocument() throws SAXException {
super.startDocument();
words = new ArrayList();
builder = new StringBuilder();
}
//Override DefaultHanlders startElement call back so we can
//test for 'Word' Element
public void startElement(String uri, String localName,
String name, Attributes attributes)
throws SAXException {
super.startElement(uri, localName, name, attributes);
//use name instead of localName, that is for namespaces
if (name.equalsIgnoreCase(WORD)){
this.currentWord = new Word();
}
}
}
In this class we extend DefaultHandler and override serveral of its methods to get it to work the way we need. If the current element in the XML feed is a 'word', we instantiate a Word object and set it's members to the XML 'word' element's children's values.
The method 'getWords' returns an ArrayList of Word objects, each matching a 'word' element in the XML
Now to test these classes. The following is a simple class that will instantiate our FeedParser, parse the XML, and write the results to a file. View the classes comments for line-by-line details.
package com.matt.test;
import java.util.*; //for List
import java.io.*; //for PrintStream
public class TestXml{
public static void main(String...args) throws IOException{
//Create file to write parsed results to
FileOutputStream fo = new FileOutputStream("results.txt");
//Wrap OutputStream in PrintStream
//Using PrintStream instead of print write so we can set encoding
//PrintStream writes bytes rather than chars
PrintStream out = new PrintStream(fo,true,"UTF8");
//Get XML data
FeedParser fp = new FeedParser(
"http://www.zekkocho.com/api/japanese/wordset.php?n=10");
//Parse the XML
List words = fp.parse();
//Loop through each word and print the results
for(Word w: words){
//System.out.println("Kanji: " + w.getKanji());
out.println("Kanji: " + w.getKanji());
//System.out.println("Hiragana: " + w.getHiragana());
out.println("Hiragana: " + w.getHiragana());
//Out put each possible definition
for(String s: w.getDefinitions()){
//System.out.println("Definition: " + s);
out.println("Definition: " + s);
}
}
//Flushing the output stream is not necessary.
//The second parameter in our PrintStream is set to true
//This enables automatic flushing after each byte is written
}
}
This API returns a random word set, but the results should look something like:
Kanji: 夕 Hiragana: ゆう Definition: evening Kanji: 美術展 Hiragana: びじゅつてん Definition: art exhibition Kanji: 論理 Hiragana: ろんり Definition: logic Definition: logical Kanji: 王冠 Hiragana: おうかん Definition: crown Definition: diadem Definition: bottle cap Kanji: 受け入れる Hiragana: うけいれる Definition: to accept Definition: to receive Kanji: 峡谷 Hiragana: きょうこく Definition: glen Definition: ravine Definition: gorge Definition: canyon Kanji: 自社 Hiragana: じしゃ Definition: one's company Definition: company one works for Definition: in-house Definition: belonging to the company Kanji: 煽る Hiragana: あおる Definition: to fan Definition: to agitate Definition: to stir up Kanji: 誓い Hiragana: ちかい Definition: oath Definition: vow Kanji: 敏腕 Hiragana: びんわん Definition: ability
You might also be interested in





