In this tutorial return on equity (ROE) is calculated for each industry over the years 2006 through 2008. You will need a login for WRDS in order to retrieve the required data from Compustat.

SAS code

/* tell SAS where to store the datasets on your computer 
(you may want to change this directory; make sure the directory exists) */
libname myLib "D:\_examples\sasdata";

/* details about the PC-connect (this is always the same), it will trigger a login prompt */
%let wrds = wrds.wharton.upenn.edu 4016;options comamid = TCP remote=WRDS;
signon username=_prompt_;

/* everything after 'rsubmit' will be executed on WRDS!! */
rsubmit;

/* data is organized in libraries, locally, as well as on WRDS
tell SAS to look in '/wrds/comp/sasdata/naa' when you use 'compx' as the library name */
libname compx '/wrds/comp/sasdata/naa';

/* SQL (Structured Queary Language) is a language to process data that is supported by SAS 

 "create table example as "
-> create a new table with the name 'example' in the default library 'work'

"select a.* from compx.funda a"
-> select all records from compx.funda (the a behind compx.funda means that a will 
refer to this dataset. 
a.* therefore selects all variables

"(keep = gvkey ni seq sich ROE fyear)  "
only the variables gvkey ni seq sich ROE fyear are included in the newly created table

"where 2006 <= fyear <= 2008"
-> the fiscal year of the observations need to be within 2006-2008

"and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C'"
-> these are additional requirements that need to be included to exclude doubles*/


PROC SQL;
  create table example (keep = gvkey ni seq sich ROE fyear)  as
  select a.*
  from compx.funda a
  where 2006 <= fyear <= 2008
	and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C' ;
quit;

/* since the dataset is created on the WRDS server, it needs to be downloaded in 
order to access it locally; download the dataset work.example from the server to 
your computer */
proc download data= example out=myLib.a_example;

/* 'run' tells SAS to start executing the submitted statements */
run;

/* stop processing statements on WRDS and continue working locally */
endrsubmit;

/* the data procedure results in a new dataset
with the 'keep' statement the variables are specified that need to be kept in the 
newly created dataset alternatively, there also exists a 'drop' statement (not used 
here), where variables that need not be included in the newly created dataset are 
specified */
data myLib.b_roe (keep = gvkey ni seq sich ROE fyear) ;

/* the set keyword is used to include a set as a starting point
(it can actually be used to add multiple datasets)*/
set myLib.a_example;

/* drop firm-years with missing observations */
if ni ne . and seq ne . and sich ne .;

/* drop firm-years with negative and zero equity */
if seq > 0;

/* create a new variable ROE, which is computed as net income divided by (end of 
year) equity */
ROE = ni / seq;
run;

/* add an industry variable;

Industry                           Primary SIC codes
1  Agriculture                     1-999
2  Chemicals                       2800-2824, 2840-2899
3  Computers                       7370-7379, 3570-3579 and 3670-3679
4  Durable Manufacturers           3000-3999, excluding 3570-3579 and 3670-3679
5  Extractive Industries           2900-2999, 1300-1399
6  Financial Institutions          6000-6499
7  Food                            2000-2111
8  Insurance and Real Estate       6500-6999
9  Mining and Construction         1000-1999, excluding 1300-1399
10 Other                           9000 and above
11 Pharmaceuticals                 2830-2836
12 Retail                          5000-5999
13 Services                        7000-8999, excluding 7370-7379
14 Textiles and Printing           2200-2790
15 Transportation                  4000-4899
16 Utilities                       4900-4999

This classification is based on:
- Barth, M., W. Beaver and W. Landsman, 1998. Relative valuation roles of equity 
book value and net income as a function of financial health. Journal of Accounting 
and Economics 25(1): 1-34.
- Easton, P. and J. Pae, 2004. Accounting conservatism and the relation between 
returns and accounting data. Review of Accounting Studies 9(4): 495-521. */

data myLib.c_roe_ind;
set myLib.b_roe;

/* drop firm-years with a missing sic code*/
if sich;					

/* create a variable 'industry', the value will depend on the value of the SIC 
code (sich variable)

if sic code is smaller than (or equal to) 999, create industry variable so that 
it equals 'Agriculture' */

if sich <= 999	
	then industry = "Agriculture";

/* similar, but now sic can be within 2 ranges
sic either between 1000 and 1300 or 1400-2000 is Mining*/
if (  (sich >=1000 && sich <=1299)  or  (sich >=1400 && sich <=1999) ) 	
	then industry = "Mining_Const";

/* etcetera */
if (sich >=2000 && sich <=2111) 
	then industry = "Food";
if (sich >=2200 && sich <=2790) 
	then industry = "Textiles_Print";
if (  (sich >=2800 && sich <=2824)  or  (sich >=2840 && sich <=2899) ) 	
	then industry = "Chemicals";
if (sich >=2830 && sich <=2836) 
	then industry = "Pharmaceuticals";
if (  (sich >=2900 && sich <=2999)  or  (sich >=1300 && sich <=1399) ) 
	then industry = "Extractive_Ind";
if (  (sich >=3000 && sich <=3569)  or  (sich >=3580 && sich <=3669) or  
	(sich >=3680 && sich <=3999) ) 
	then industry = "Durable_Man";
if (  (sich >=7370 && sich <=7379)  or  (sich >=3570 && sich <=3579) or  
	(sich >=3670 && sich <=3679) ) 
	then industry = "Computers";
if (sich >=4000 && sich <=4899) 	
	then industry = "Transportation";
if (sich >=4900 && sich <=4999) 
	then industry = "Utilities";
if (sich >=5000 && sich <=5999) 
	then industry = "Retail";
if (sich >=6000 && sich <=6411) 
	then industry = "Financial_Inst";
if (sich >=6500 && sich <=6999) 	
	then industry = "Insurance";
if (  (sich >=7000 && sich <=7369)  or  (sich >=7380 && sich <=8999) ) 
	then industry = "Services";
if sich >= 9000			
	then industry = "Other";
run;


/* the next procedure (means) requires that we sort the data on industry;
nodup means that duplicate observations are deleted, for example firms that have 
changed their fiscal year-end may have been included multiple times */
proc sort data = myLib.c_roe_ind nodup;by industry fyear; run;

/* proc means produces descriptive statistics;
the 'by' statement will create these statistics for each group 
(in this case: for each industry);
the NOPRINT keyword suppresses printing of output;
the output out statement specifies that a new dataset with the descriptives needs 
to be created;

overview of variables you can output with proc means: 
http://www.sfu.ca/sasdoc/sashtml/proc/z0146734.htm */

PROC MEANS  data = myLib.c_roe_ind NOPRINT ;
var ROE;
by industry fyear;
OUTPUT OUT=myLib.d_roe_ind MEAN=MEAN MIN=MIN MEDIAN=MEDIAN  MAX=MAX ;
run;

Submitting the statements

Copy/paste the following pieces of code to the SAS-window called 'editor' (or download here; right-click and 'save as'). This is where you edit your sascode. Make sure to edit the path to the directory where you wish to store SAS datasets (line 3). Explanation of the different statements are included throughout the code. Additional information about submitting the statements, viewing/exporting a dataset and information on the names of libraries, data sets and variables on WRDS is included below.

The contents of the menu-bar (top) depends on the active window. In order to submit and execute code, the 'editor' window needs to be active (click in the window to activate). Once active, an icon of a little guy appears (4th incon from the right). Clicking this icon executes the code. Alternatively, you may press 'F3'. To execute some lines of code, select those lines and then submit.

Execution of the statements may actually take a few minutes. While the script is executing it will show 'Processing submitted statements' in the window name. Execution of the submitted statements can be aborted by clicking Break' (icon with exclamation mark inside a circle, second from right), and select '3. Cancel Submitted Statements' .

Advanced note

If clicking 'Break' does not work, you may consider to 'kill' the job on the WRDS server.
This is possible from the WRDS website (Home -> Mywrds), which shows a list of current processes. Also, you can use a SSH shell (advanced), which is explained under Home -> Support -> Accessing And Manipulating The Data -> Unix Acces - > Introduction to the WRDS Unix System->"8. Managing processes", "e. Terminating processes"

All statements and information (including errors) about the execution are included in the log-file, which has its own window. Important: SAS does not abort on errors. If there is an error in your script, it will try to continue. If an error prevents overwriting of a dataset, SAS will continue to work with the previous dataset. Thus, if a dataset does not seem to update, there may be an error in your script.

Viewing datasets

To view the dataset, click on the explorer-window (on the left), doubleclick on the library myLib. SAS will show all files within that library, which at this point will only be 'A_sample'. Doubleclicking opens a window with the dataset.

The names of the WRDS libraries and datasets

The example code includes a library assignment:

libname compx '/wrds/comp/sasdata/naa';

And, the SQL code refers to a dataset:

compx.funda

To find the names of the different libraries and datasets on WRDS, use the WRDS website (Home -> Support -> Data -> Dataset List). The library is shown as well as the dataset names.

To view the different databases that your institution has access to, click Home -> MyWRDS -> products on the WRDS website.

The names of the variables to download

In the example code, the following variables are retrieved: gvkey ni seq sich ROE fyear.

Since we are interested in computing ROE for each industry over a number of years, we will need net income, the book value of equity, the SIC industry code and the fiscal year. We also need a firm identifier to get rid of possible duplicate observations.

Each dataset comes with a list of variables. In order to learn the names of the variables in Funda (Fundamentals) : Home -> Support -> Data -> Dataset List, then look for COMP:NA [/wrds/comp/sasdata/naa] and click on 'funda' (Merged Fundamental Annual File)

Variable Code  
Net Income (Loss) ni  
Stockholders' Equity - Total seq  
Industry code (SIC) sich  
Data Year - Fiscal fyear  
Global Company Key gvkey  

Exporting a dataset

To export a dataset, rightclick on a dataset in the explorer-plane while viewing the contents of a library (Note : there are tabs to switch between 'results' and 'explorer' at the bottom of the left pane; use 'explorer' to open the appropriate library). Select 'export' and complete the wizard.

In this example, the results have been exported and further processed (the image below is a screenshot of a Flash movie).

ROE by industry

ROE by industry

Results

As expected, profitability for financial institutions has been under pressure in 2008. Also, firms in the textiles and printing industry have lower levels of profitability. Accounting principles (expensing of R&D) seem to explain the losses for pharmaceutical firms, since the median firm in this industry is already experiencing substantial losses (as a percentage of equity) in 2006.

All rights reserved. © 2010-2014 wrds.us [Copyright] [Privacy Statement] [Disclaimer] [About]