Monday, April 1, 2013

Plots in R and Python

It makes sense for any analyst to learn R if only for making graphs. Making graphs in R is very natural. Below we'll look at making the same simple plot, against the same data, using R and also in Python using Matplotlib. The data plotted is just the y = x + noise points generated in the script shown in the post on prettifying code and stored as simpleData.csv.txt.  The idea here is just to show how easy it is to make plots. 

First, the R script. 

1
2
3
4
5
6
7
8
# import a csv file to a data frame and plot
# setwd(might need to setwd to where data is) 
read.csv("sampleData.csv.txt") -> sdf # load dataframe
summary(sdf) # check that file has been read
attach(sdf) # use names in the dataframe below
png("simpleRPlot.png", type='quartz') # prepare png
plot(x, y, type="p", main="sampleData y by x R plot")
dev.off() # cleanup

And the Python script. 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"""
Import a csv file of x and y, and plot y by x.
"""
import numpy
import matplotlib.pyplot as plt

# load the sample data from a csv file into parallel
# arrays x and y
infn = "sampleData.csv.txt" 
x, y = numpy.loadtxt(infn, delimiter=",", \
                     skiprows=1, unpack=True)

fig = plt.figure() # open up a figure
fig.set_size_inches(5, 5) # set the size of the figure 
ax = fig.add_subplot(111) # put one plot on the figure
ax.plot(x, y, 'ko') # y by x, black (k) circles (o) 
ax.set_xlabel('x') # label the x axis 'x'
ax.set_ylabel('y') # label y axis 'y'
ax.set_title("sampleData y by x Python plot") 
# The following font size commmands are to make the 
# chart look like a chart made made with the default
# settings in R. 
ax.title.set_fontsize(11) 
for item in (ax.xaxis.label, ax.yaxis.label):
    item.set_fontsize(9)
for L in (ax.get_xticklabels(), ax.get_yticklabels()):
    for item in L:
        item.set_fontsize(9)

fig.savefig("simplePyPlot.png") # save the figure
plt.close() # cleanup important for interactive

The resulting plots shown together.


While the charts themselves are very similar, the R script is shorter than the Python script. Some of that is because the Python has some font changes to make the chart look like the R chart. But some of it is because R makes good assumptions about what a chart should include that have to be set in Matplotlib, for example how to label the X and Y axes.

No comments:

Post a Comment