Tuesday, March 19, 2013

Using Pygments to prettify code for online

It's much easier to write correct code that's correct the first time using a modern editor (emacs, vim) or an IDE that provides syntax highlighting. Syntax highlighting also makes code more readable for discussions. Because whitespace is meaningful in Python, it's especially important for Python code to look right in a blog post, and simply posting code text into html paragraphs will not look right when rendered. Many blog engines and wikis have tools to add syntax highlighting to articles, but they are all different. An alternative to learning several tools is to learn one tool that creates syntax highlighted html representations of the code, which can be placed anyplace html can be placed. Below we discuss the highlighter Pygments.

Pygments is code highlighter that generates output for a large variety of languages (including Bash, HTML, Java, Matlab, Python, and S, but unfortunately not SAS) in a variety of formats (including HTML, RTF, and LaTeX). It can be used in three ways:

  1. A Python script can import the pygments library and use its functions to format text. This is the most practical way to use pygments if generating html from a script. 
  2. A script pygmentize makes the functionality of pygments available from the command line. This is the most natural way to add highlighting to one or a few files at a time, and it's what we'll show below.
  3. The Pygments homepage has a form at the bottom that allows users to enter code, select a language, and see what Pygments highlighting looks like. Users can elect for their examples to be stored in a database of examples, and also browse examples from other users. 
To see what pygmentize does, consider a simple Python script, genSample.py, that writes a csv file read by a simple R script, importData.r. Using the original Blogger editor and converting the end of line marks to <p> would result in a total mess, because Python code has so much whitespace. The new Compose mode of editing allows an author to simply paste the Python code in, and it generates the &nbsp needed to make the indentation look correct. But we can get a highlighted html file in the shell by entering as follows:

bash$ pygmentize -O full,style=colorful,linenos=1 -f html genSample.py > genSample.py.html

The resulting file is a complete html document, from which I've inserted below the text from the first <head> to the </body>. The arguments and options for pygmentize are explained when it's run with the --help.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
"""
Generate a csv sample file of x and y, where y = b + m*x + uniform noise
The name of the file, number of lines of output, intercept, slope, and
magnitude of noise are set in constants. 
"""
import random

# CONSTANTS
OUTF = open("sampleData.csv.txt", "w")
LINES = 20
OFFSET = 1 
SLOPE = 1
NOISE = .3

def gety(x, b=OFFSET, m=SLOPE, e=NOISE):
    """
    Return y for a given x
    """
    eps = random.uniform(-e, e) 
    y = m*x + b + eps
    return y

print >> OUTF, "x,y" # labels record
for i in range(LINES):
    x = i/10. # x takes on 0, 0.1, ... 1.9
    y = gety(x)
    print >> OUTF, "%f,%f" %(x, y)

OUTF.close()
Similarly, running

bash$ pygmentize -O full,style=colorful,linenos=1 -l r  importData.r

provides the highlighted text below:

1
2
3
4
5
# import a csv file to a data frame and summary it
# the user might need to setwd to the directory where the data is
# setwd("/pathToDirectoryWithData") 
read.csv("sampleData.csv.txt") -> sdf
summary(sdf)
The line numbers and highlighting make Python and R snippets easier to understand and discuss.

No comments:

Post a Comment