Ruby Tutorial Part 05 Ruby and XML
Repository
What Will I Learn?
- You will learn the difference between HTML and XML " HTML vs XML ".
- You will learn the " DOM " technique.
- You will learn the " SAX " technique.
Requirements
- You need to have Ruby installed on your computer
Difficulty
- Basic
Tutorial Contents
In this tutorial we will learn the Ruby with XML to manipulate the data ( store and display).
XML belongs to the family of " Markup Languages ", which are also known as the HTML language.
This family is descended from the native language of SGML, which first appeared in 1960, before the Web appeared much "almost 30 years ago" to help mark and format documents and e-mails exchanged over the Internet.
1- HTML and XML
XML and HTML based on " tags ", but there is many difference between them
- The tags of HTML are defined by default but the tags of XML are defined by the developer.
- The HTML tags are used to display data but XML tags are used to display and to store data.
- There are many techniques based on XML, in the graphical interfaces for example where the data are stored in xml syntax file, also Java and .NET ..etc based in many things on XML.
<?xml version="1.0" encoding="UTF-8"?>
<!-
Document : products.xml
Created on : May 27, 2019, 08:20 AM
Author : Alex-Harry
Description:
Purpose of the document follows.
-->
<products>
<product>
<id>1</id>
<name>Product1</name>
<price>20</price>
</product>
<product>
<id>2</id>
<name>Product2</name>
<price>40</price>
</product>
<product>
<id>3</id>
<name>Product3</name>
<price>70</price>
</product>
<product>
<id>4</id>
<name>Product4</name>
<price>30</price>
</product>
</products>
An example of products, there is many products and each product contains 3 XML tags " id, name and price", the form of tag is " <tag></tag> ", the example is like a global table of products contains products objects, each object or each table contains " id, name and price ".
If this example is in HTML the code will be with HTML tags
<table>
<tr>
<th>id</th>
<th>name</th>
<th>price</th>
</tr>
<tr>
<td>1</td>
<td>Product1</td>
<td>20</td>
<tr>
<tr>
<td>2</td>
<td>Product2</td>
<td>40</td>
<tr>
<tr>
<td>3</td>
<td>Product3</td>
<td>70</td>
<tr>
<tr>
<td>4</td>
<td>Product4</td>
<td>30</td>
<tr>
</table>
2- DOM
a- Import REXML
The first thing to start is to import the " REXML " which was inspired by the Electric XML library for Java
#STEP 1 (import rexml module)
require "rexml/document"
include REXML
To import the "REXML" there is two methods " require " with the type of document and " include " with the name of library.
b- Creation Object
The name of XML file that contains the products is " products.xml ", to get or to open this file is to use the " Document Object " from the " Document Class ".
#STEP 2 (load the xml document) xmlDOC=Document.new(File.new("products.xml"))
The new file object is created and passed to the new " Document " object to be a "xmlDoc" variable.
Another way is to use " HEREDOC string ", but this way will convert the script to the " Hard Coding " form, and it's not required.
c- Get Content
If the developer follow the previous code he can easily open and store data in the " products.xml " file, how to get all the content of the file ? Is by using the " loop ".
xmlDOC.elements.each("products/product/name") do |element|
puts element.text
end
#Outpout =>
Product1
Product2
Product3
Product4
In the xmlDoc object which is the file there is elements, to get the elements there is the " elements " method and to get element by element the " each " method must be used with passing the attribute or the tag name combined by the file name and objects name to this method.
The loop will return the element " name " or the tag " name ", to get the value or the text we use the function " text ".
The result will be the names of all the products because the tag " name " contains the name of the product, and if the function "text" is deleted the result will be
<name>Product1</name>
<name>Product2</name>
<name>Product3</name>
<name>Product4</name>
To print the id also just the "products/product/name" will be replaced by "products/product/id" and the result will be the list of identifications of all products.
ids=[]
xmlDOC.elements.each("products/product/id") do |element|
ids.push(element.text)
end
#output =>
[1,2,3,4]
In this example is to get the id of each product and put them in an array, the array can be used in many activities in the application, it's very important!
To get the price or the "SUM" of prices there is many ways for example using the " + " operation.
sum=0
xmlDOC.elements.each("products/product/price") do |element|
sum += element.text.to_i
end
puts "Total: "+ sum.to_s
#output => 160
For each product the " sum " variable is increased, by default it starts with 0 and for each element or product it will be increased by the price of this product, because the function " text " returns a string the method " to_i " or to integer is used to convert the string to integer to be added to the sum variable.
The result can be converted to a string or maintain its type ( Int ), the result is " 20+40+70+30 = 160 ".
Finally this technique is " DOM ", and it based on the " tree " where it stores the " tree structure " in the memory.
3- SAX
The technique that many developers prefer is " SAX ", it based on " events " that start on a specific tag, it deals with " tags and attribute ".
This technique manages data, starts and finishes tags and deals with attributes. The technique of the previous example is " DOM ", to use the " SAX " there is a little modifications.
<tag attr1=val1 attr2=val2> D </tag>
a- Load REXML
In the previous example to load REXML two methods are used " require rexml from document and include REXML", in this example the " require rexml and streamlistener " must be used in addition of the two previous methods.
require "rexml/document"
require "rexml/streamlistener"
include REXML
b- Creation of Streamer
Using the " streamlistener " provides the developer many functions to manipulate, for that there is the notion of " callback ".
class ProductsStreamer
include StreamListener
def initialize
@bPrice=false
@sum=0
end
def tag_start(tag_name, attrs)
if (tag_name=="price") then
@bPrice=true
end
end
def tag_end(tag_name)
@bPrice=false
end
def text(data)
if @bPrice then
@sum += data.to_i
end
end
def get_total_sum
return @sum
end
end
The class is " ProductsStreamer ", including " StreamListener " means that all its methods and variables can be used by the new class.
Two variables are defined the variable " sum " initialized by " 0 " and the variable " bprice " initialized by " false ".
Two functions are defined " tag_start " accepts 2 parameters and checks if the tag name is " price " or not to change the variable " bprice " to " true ", and the second is " tag_end " to change also the " bprice " variable to " false ".
The last function is " get_total_sum " to return the value of the variable " sum ".
<start attr1=val attr2=val2 ....> DATA </start>
def tag_start(tag_name, attrs)
if (tag_name=="price") then
end
end
The function "start_tag" is the first function that the parser calls when it starts compiling, it passes on the tag and its attributes to be as parameters to this function.
def tag_start(tag_name, attrs)
if (tag_name=="price") then
@inPrice=true
end
end
To apply the previous example and to get just the price, when the parser find the tag " price " it will stop, so there must be a condition if the tag_name is " price ", if it's true if it's the variable "bprice" will be true also.
def tag_end(tag_name)
@inPrice=false
end
Logically if the parser is on " start_tag " the "bprice" is " true ", if the parser now is on "end_tag" what will happen ? Basically the "bprice" will be " false " the compiling is finished.
To tag " <price>20</price> " contains " data " which is " 20 ", to get the sum of prices you must get firstly the data then convert it to an integer using the function "to_i" and add it to the previous sum.
def text(data)
if @inPrice then
@sum += data.to_i
end
end
To use this class to get the sum of prices, an object from " ProductsStreamer " class must be created
ps=ProductsStreamer.new
The function " Document.parse_stream " is used with two parameters the " file and ps object "
Document.parse_stream(File.new("products.xml"), ps)
The ps object now is ready, the parser can pass to the tags, the final step is to call the function " get_total_sum "
puts bs.get_total_sum
#output => 160
Curriculum
Proof of Work Done
https://github.com/alex-harry/rubyTutorial/blob/master/ruby5.rb
Hey, @alex-harry!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Congratulations @alex-harry! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :
You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
To support your work, I also upvoted your post!
Vote for @Steemitboard as a witness to get one more award and increased upvotes!