Tuesday, March 19, 2013

Free Software for Data Analysis

Data analysis was one of the first fields to embrace computing. In the 1960s, commercial statistical packages were first developed that gave analysis access to a wide set of robust statistical procedures. Two of the most popular packages, SAS and SPSS, are widely used today. The popularity of these packages facilitates collaboration, since you can find other users to discuss or share work.

But you can't collaborate just anyone, because they are commercial packages. Free software statistical packages allow for even greater collaboration, because you can give a script to anyone and they can run it without having to obtain a license. The packages we will focus on are R, a free implementation of the S language introduced by the commercial package S-PLUS, and the Python libraries (including SciPy, Numpy, and matplotlib). Both R and SciPy are available for Windows, Linux, or OS X.

In addition to the advantages of free software, there are some ease of use advantages. S-PLUS and R have long been popular with SAS users because it is so easy to make high quality plots in S-PLUS or R. The scripting language for SAS is older than C, and it shows, while S and Python are much more modern languages that are simpler to learn or develop with. The commercial packages are themselves modernizing. Python has been embedded as a scripting language for SPSS since 2005, and SAS has started introducing elements of Java into SAS with Version 9.2 in 2008.

No comments:

Post a Comment