This tutorial shows how to retrieve an XML document from a URL and parse it using Java's SaxParser. The XML in question comes from a zekkocho exclusive API that returns a random set of Japanese words along with their possible Engilsh definitions.

http://www.zekkocho.com/api/japanese/wordset.php?n=5


<words>−
<word>
<kanji>効力</kanji>
<hiragana>こうりょく</hiragana>
<definition>effect</definition>
<definition>efficacy</definition>
<definition>validity</definition>
<definition>potency</definition>
</word>
...

Before we build the parser, we will need a simple class that represents one word, or entry. This class will include getters and setters for each child element of the XML word we intend to save.

Word.java

package com.matt.test;

import java.util.*;
public class Word{	
    private String kanji;
    private String hiragana;
    private List definitions;   

    public Word(){
    	this.definitions = new ArrayList();
    }   
    public void setKanji(String kanji){
    	this.kanji = kanji;
    }
    public String getKanji(){
    	return this.kanji;
    }    
    public void setHiragana(String hiragana){
    	this.hiragana = hiragana;
    }
    public String getHiragana(){
    	return this.hiragana;
    }   
    public void addDefinition(String definition) {
    	//Add to list of possible definitions
    	this.definitions.add(definition);
    }
    public List getDefinitions() {
        return this.definitions;
    }
}

Notice that definitions is an ArrayList. Since there can be more than one definition per word, we will record the multiple definitions in this list.

Next we create the FeedParse class. This class will be responsible for instantiating the SaxParser, connecting to the URL that will provide the XML, and passing the SaxParser an XML handler, which we will create last.

FeedParser.java

package com.matt.test;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public class FeedParser {

    //Constant definitions for our XML element names
    static final String KANJI = "kanji";
    static final  String HIRAGANA = "hiragana";
    static final  String DEFINITION = "definition";      

    //comprises the above in XML
    static final  String WORD = "word";    

    //Create URL object in constructor
    final URL feedUrl;
    protected FeedParser(String feedUrl){
        try {
            this.feedUrl = new URL(feedUrl);
        } catch (MalformedURLException e) {
            throw new RuntimeException(e);
        }
    }    

    //Pars the XML from feedURL and return as collection of Word objects
    public List parse() {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser parser = factory.newSAXParser();
            //Instantiate our RSS handler and pass it into SaxParser
            XMLHandler handler = new XMLHandler();
            parser.parse(this.getInputStream(), handler);
            return handler.getWords();
        } catch (Exception e) {
            throw new RuntimeException(e);
        } 
    }       

    //Connects to feedUrl
    protected InputStream getInputStream() {
        try {
            return feedUrl.openConnection().getInputStream();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }
}

The SaxParser object is created in the parse() method. It takes two parameters. The first is an InputStream which feeds us the XML. The second is a DefaultHandler, or in this case, a custom class called XMLHandler that extends DefaultHandler.

XMLHanlder.java

package com.matt.test;
import static com.matt.test.FeedParser.*;
import java.util.ArrayList;
import java.util.List;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Custom XML Handler.  This will be passed to the SAXParser's parse method

public class XMLHandler extends DefaultHandler{ 
    private List words;    //collects Word objects
    private Word currentWord;	//current word being parsed from XML
    private StringBuilder builder; //holds XML elements text    

    public List getWords(){ 
        return this.words;
    }    

    //Override DefaultHandlers characters method so we can add 
    //the chars to our own StringBuilder

    public void characters(char[] ch, int start, int length) 
    throws SAXException {
        super.characters(ch, start, length);
        builder.append(ch, start, length);
    }



   //Override DefaultHanlders endElement callback so we can 
   //record the XML elements data   

    public void endElement(String uri, String localName, String name) 
    throws SAXException {
        super.endElement(uri, localName, name);     
        //If the current Element is 'Word', Record value for element of 'Word'
        //Our StringBuilder currently holds the data for the current element
        if (this.currentWord != null){       
			//use name instead of localName, that is for name spaces

            if (name.equalsIgnoreCase(KANJI)){
                currentWord.setKanji(builder.toString());
            } else if (name.equalsIgnoreCase(HIRAGANA)){
                currentWord.setHiragana(builder.toString());
            } else if (name.equalsIgnoreCase(DEFINITION)){   
                currentWord.addDefinition(builder.toString());
            } else if (name.equalsIgnoreCase(WORD)){
                words.add(currentWord);
            }           
            builder.setLength(0);   
        }
    }

    //Override DefaultHanlders startDocument so we can 
    //instantiate our own objects prior to parsing
    public void startDocument() throws SAXException {
        super.startDocument();
        words = new ArrayList();
        builder = new StringBuilder();
    }

    //Override DefaultHanlders startElement call back so we can 
   //test for 'Word' Element   
    public void startElement(String uri, String localName, 
    String name, Attributes attributes) 
    throws SAXException {
        super.startElement(uri, localName, name, attributes);
		 //use name instead of localName, that is for namespaces
        if (name.equalsIgnoreCase(WORD)){
            this.currentWord = new Word();    
        }  
    }
}

In this class we extend DefaultHandler and override serveral of its methods to get it to work the way we need. If the current element in the XML feed is a 'word', we instantiate a Word object and set it's members to the XML 'word' element's children's values.
The method 'getWords' returns an ArrayList of Word objects, each matching a 'word' element in the XML

Now to test these classes. The following is a simple class that will instantiate our FeedParser, parse the XML, and write the results to a file. View the classes comments for line-by-line details.

package com.matt.test;
import java.util.*;	//for List
import java.io.*;	//for PrintStream

public class TestXml{	
	public static void main(String...args) throws IOException{		
		//Create file to write parsed results to
		FileOutputStream fo = new FileOutputStream("results.txt");
	
		//Wrap OutputStream in PrintStream
		//Using PrintStream instead of print write so we can set encoding
		//PrintStream writes bytes rather than chars
		PrintStream out = new PrintStream(fo,true,"UTF8");		

		//Get XML data
		FeedParser fp = new FeedParser(
                "http://www.zekkocho.com/api/japanese/wordset.php?n=10");		

		//Parse the XML
		List words = fp.parse();			

		//Loop through each word and print the results
		for(Word w: words){
			//System.out.println("Kanji: " + w.getKanji());
			out.println("Kanji: " + w.getKanji());
			//System.out.println("Hiragana: " + w.getHiragana());	
			out.println("Hiragana: " + w.getHiragana());
			//Out put each possible definition
			for(String s: w.getDefinitions()){
			    //System.out.println("Definition: " + s);
			    out.println("Definition: " + s);
			}
		}
	  //Flushing the output stream is not necessary.  
          //The second parameter in our PrintStream is set to true
	   //This enables automatic flushing after each byte is written	
	}
}

This API returns a random word set, but the results should look something like:

Kanji: 夕
Hiragana: ゆう
Definition: evening

Kanji: 美術展
Hiragana: びじゅつてん
Definition: art exhibition

Kanji: 論理
Hiragana: ろんり
Definition: logic
Definition: logical

Kanji: 王冠
Hiragana: おうかん
Definition: crown
Definition: diadem
Definition: bottle cap

Kanji: 受け入れる
Hiragana: うけいれる
Definition: to accept
Definition: to receive

Kanji: 峡谷
Hiragana: きょうこく
Definition: glen
Definition: ravine
Definition: gorge
Definition: canyon

Kanji: 自社
Hiragana: じしゃ
Definition: one's company
Definition: company one works for
Definition: in-house
Definition: belonging to the company

Kanji: 煽る
Hiragana: あおる
Definition: to fan
Definition: to agitate
Definition: to stir up

Kanji: 誓い
Hiragana: ちかい
Definition: oath
Definition: vow

Kanji: 敏腕
Hiragana: びんわん
Definition: ability