Working with XML Files in R

Working with XML Files in R, Extensible Markup Language, or XML, is composed of markup tags, each of which illustrates the data that an XML file’s specific property carries.

With R’s XML package, we can work with the XML files.

The following command must be used to explicitly install the package:

install.packages("XML")

Making an XML file

By saving the data with the appropriate tags that contain content information and ending in “.xml,” XML files can be formed.

To demonstrate the several actions that may be done on the file, we will use the XML file “sample.xml” as an example:

<RECORDS>
  <STUDENT>
      <ID>1</ID>
      <NAME>Alias</NAME>
      <MARKS>20</MARKS>
      <BRANCH>IT</BRANCH>
  </STUDENT>
  <STUDENT>
      <ID>2</ID>
      <NAME>Biji</NAME>
      <MARKS>40</MARKS>
      <BRANCH>Commerce</BRANCH>
   </STUDENT> 
  <STUDENT>
      <ID>3</ID>
      <NAME>Yahood</NAME>
      <MARKS>60</MARKS>
      <BRANCH>Humanities</BRANCH>
   </STUDENT>
  <STUDENT>
      <ID>4</ID>
      <NAME>Mally</NAME>
      <MARKS>66</MARKS>
      <BRANCH>IT</BRANCH>
   </STUDENT>
  <STUDENT>
      <ID>5</ID>
      <NAME>Zayna</NAME>
      <MARKS>50</MARKS>
      <BRANCH>IT</BRANCH>
   </STUDENT>
</RECORDS>

Bootstrap Confidence Interval R »

Reading an XML file

Once the package has been installed, the XML file can be read by using the xmlparse() function, which accepts the XML file name as input and outputs the contents of the file as a list.

The current working directory is where you should find the file.

There is also a package called “methods” that needs to be installed. The contents of the file “sample.xml” can be read using the code that follows.

# loading the library and other important packages
library("XML")
library("methods")
 
# the contents of sample.xml are parsed
data <- xmlParse(file = "sample.xml")
 
print(data)

Obtaining details from the XML file

It is possible to parse XML files and carry out actions on their different parts.

R has several built-in functions that can be used to retrieve details about the nodes connected to a file, as well as the number of nodes present in the file and their attributes.

# Give the input file name to the function.
res <- xmlParse(file = "sample.xml")
 
# Extract the root node.
rootnode <- xmlRoot(res)
 
# number of nodes in the root.
nodes <- xmlSize(rootnode)
 
# get entire contents of a record
second_node <- rootnode[2]
 
# get 3rd attribute of 4th record
attri <- rootnode[[4]][[3]]
 
cat('number of nodes: ', nodes)
print ('details of 2 record: ')
print (second_node)
 
# prints the marks of the fourth record
print ('3rd attribute of 4th record: ', attr)

Dataframe to XML conversion

The XML data can be transformed into a data frame with rows and columns to improve the readability of the data.

An inbuilt method in R called xmlToDataFrame() takes an XML file as input and returns the relevant data as a data frame. This mimics how simple it would be to handle and process massive volumes of data.

# Convert the input xml file to a data frame.
dataframe <- xmlToDataFrame("sample.xml")
print(dataframe)

How to Perform a Log Rank Test in R » Data Science Tutorials

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

eleven + 18 =