In this tutorial return on equity (ROE) is calculated for each industry over the years 2006 through 2008. You will need a login for WRDS in order to retrieve the required data from Compustat.
/* tell SAS where to store the datasets on your computer (you may want to change this directory; make sure the directory exists) */ libname myLib "D:\_examples\sasdata"; /* details about the PC-connect (this is always the same), it will trigger a login prompt */ %let wrds = wrds.wharton.upenn.edu 4016;options comamid = TCP remote=WRDS; signon username=_prompt_; /* everything after 'rsubmit' will be executed on WRDS!! */ rsubmit; /* data is organized in libraries, locally, as well as on WRDS tell SAS to look in '/wrds/comp/sasdata/naa' when you use 'compx' as the library name */ libname compx '/wrds/comp/sasdata/naa'; /* SQL (Structured Queary Language) is a language to process data that is supported by SAS "create table example as " -> create a new table with the name 'example' in the default library 'work' "select a.* from compx.funda a" -> select all records from compx.funda (the a behind compx.funda means that a will refer to this dataset. a.* therefore selects all variables "(keep = gvkey ni seq sich ROE fyear) " only the variables gvkey ni seq sich ROE fyear are included in the newly created table "where 2006 <= fyear <= 2008" -> the fiscal year of the observations need to be within 2006-2008 "and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C'" -> these are additional requirements that need to be included to exclude doubles*/ PROC SQL; create table example (keep = gvkey ni seq sich ROE fyear) as select a.* from compx.funda a where 2006 <= fyear <= 2008 and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C' ; quit; /* since the dataset is created on the WRDS server, it needs to be downloaded in order to access it locally; download the dataset work.example from the server to your computer */ proc download data= example out=myLib.a_example; /* 'run' tells SAS to start executing the submitted statements */ run; /* stop processing statements on WRDS and continue working locally */ endrsubmit; /* the data procedure results in a new dataset with the 'keep' statement the variables are specified that need to be kept in the newly created dataset alternatively, there also exists a 'drop' statement (not used here), where variables that need not be included in the newly created dataset are specified */ data myLib.b_roe (keep = gvkey ni seq sich ROE fyear) ; /* the set keyword is used to include a set as a starting point (it can actually be used to add multiple datasets)*/ set myLib.a_example; /* drop firm-years with missing observations */ if ni ne . and seq ne . and sich ne .; /* drop firm-years with negative and zero equity */ if seq > 0; /* create a new variable ROE, which is computed as net income divided by (end of year) equity */ ROE = ni / seq; run; /* add an industry variable; Industry Primary SIC codes 1 Agriculture 1-999 2 Chemicals 2800-2824, 2840-2899 3 Computers 7370-7379, 3570-3579 and 3670-3679 4 Durable Manufacturers 3000-3999, excluding 3570-3579 and 3670-3679 5 Extractive Industries 2900-2999, 1300-1399 6 Financial Institutions 6000-6499 7 Food 2000-2111 8 Insurance and Real Estate 6500-6999 9 Mining and Construction 1000-1999, excluding 1300-1399 10 Other 9000 and above 11 Pharmaceuticals 2830-2836 12 Retail 5000-5999 13 Services 7000-8999, excluding 7370-7379 14 Textiles and Printing 2200-2790 15 Transportation 4000-4899 16 Utilities 4900-4999 This classification is based on: - Barth, M., W. Beaver and W. Landsman, 1998. Relative valuation roles of equity book value and net income as a function of financial health. Journal of Accounting and Economics 25(1): 1-34. - Easton, P. and J. Pae, 2004. Accounting conservatism and the relation between returns and accounting data. Review of Accounting Studies 9(4): 495-521. */ data myLib.c_roe_ind; set myLib.b_roe; /* drop firm-years with a missing sic code*/ if sich; /* create a variable 'industry', the value will depend on the value of the SIC code (sich variable) if sic code is smaller than (or equal to) 999, create industry variable so that it equals 'Agriculture' */ if sich <= 999 then industry = "Agriculture"; /* similar, but now sic can be within 2 ranges sic either between 1000 and 1300 or 1400-2000 is Mining*/ if ( (sich >=1000 && sich <=1299) or (sich >=1400 && sich <=1999) ) then industry = "Mining_Const"; /* etcetera */ if (sich >=2000 && sich <=2111) then industry = "Food"; if (sich >=2200 && sich <=2790) then industry = "Textiles_Print"; if ( (sich >=2800 && sich <=2824) or (sich >=2840 && sich <=2899) ) then industry = "Chemicals"; if (sich >=2830 && sich <=2836) then industry = "Pharmaceuticals"; if ( (sich >=2900 && sich <=2999) or (sich >=1300 && sich <=1399) ) then industry = "Extractive_Ind"; if ( (sich >=3000 && sich <=3569) or (sich >=3580 && sich <=3669) or (sich >=3680 && sich <=3999) ) then industry = "Durable_Man"; if ( (sich >=7370 && sich <=7379) or (sich >=3570 && sich <=3579) or (sich >=3670 && sich <=3679) ) then industry = "Computers"; if (sich >=4000 && sich <=4899) then industry = "Transportation"; if (sich >=4900 && sich <=4999) then industry = "Utilities"; if (sich >=5000 && sich <=5999) then industry = "Retail"; if (sich >=6000 && sich <=6411) then industry = "Financial_Inst"; if (sich >=6500 && sich <=6999) then industry = "Insurance"; if ( (sich >=7000 && sich <=7369) or (sich >=7380 && sich <=8999) ) then industry = "Services"; if sich >= 9000 then industry = "Other"; run; /* the next procedure (means) requires that we sort the data on industry; nodup means that duplicate observations are deleted, for example firms that have changed their fiscal year-end may have been included multiple times */ proc sort data = myLib.c_roe_ind nodup;by industry fyear; run; /* proc means produces descriptive statistics; the 'by' statement will create these statistics for each group (in this case: for each industry); the NOPRINT keyword suppresses printing of output; the output out statement specifies that a new dataset with the descriptives needs to be created; overview of variables you can output with proc means: http://www.sfu.ca/sasdoc/sashtml/proc/z0146734.htm */ PROC MEANS data = myLib.c_roe_ind NOPRINT ; var ROE; by industry fyear; OUTPUT OUT=myLib.d_roe_ind MEAN=MEAN MIN=MIN MEDIAN=MEDIAN MAX=MAX ; run;
Copy/paste the following pieces of code to the SAS-window called 'editor' (or download here; right-click and 'save as'). This is where you edit your sascode. Make sure to edit the path to the directory where you wish to store SAS datasets (line 3). Explanation of the different statements are included throughout the code. Additional information about submitting the statements, viewing/exporting a dataset and information on the names of libraries, data sets and variables on WRDS is included below.
The contents of the menu-bar (top) depends on the active window. In order to submit and execute code, the 'editor' window needs to be active (click in the window to activate). Once active, an icon of a little guy appears (4th incon from the right). Clicking this icon executes the code. Alternatively, you may press 'F3'. To execute some lines of code, select those lines and then submit.
Execution of the statements may actually take a few minutes. While the script is executing it will show 'Processing submitted statements' in the window name. Execution of the submitted statements can be aborted by clicking Break' (icon with exclamation mark inside a circle, second from right), and select '3. Cancel Submitted Statements' .
If clicking 'Break' does not work, you may consider to 'kill' the job on the WRDS server.
This is possible from the WRDS website (Home -> Mywrds), which shows a list of current processes. Also, you can use a SSH shell (advanced), which is explained under Home -> Support -> Accessing And Manipulating The Data -> Unix Acces - > Introduction to the WRDS Unix System->"8. Managing processes", "e. Terminating processes"
All statements and information (including errors) about the execution are included in the log-file, which has its own window. Important: SAS does not abort on errors. If there is an error in your script, it will try to continue. If an error prevents overwriting of a dataset, SAS will continue to work with the previous dataset. Thus, if a dataset does not seem to update, there may be an error in your script.
To view the dataset, click on the explorer-window (on the left), doubleclick on the library myLib. SAS will show all files within that library, which at this point will only be 'A_sample'. Doubleclicking opens a window with the dataset.
The example code includes a library assignment:
libname compx '/wrds/comp/sasdata/naa';
And, the SQL code refers to a dataset:
To find the names of the different libraries and datasets on WRDS, use the WRDS website (Home -> Support -> Data -> Dataset List). The library is shown as well as the dataset names.
To view the different databases that your institution has access to, click Home -> MyWRDS -> products on the WRDS website.
In the example code, the following variables are retrieved: gvkey ni seq sich ROE fyear.
Since we are interested in computing ROE for each industry over a number of years, we will need net income, the book value of equity, the SIC industry code and the fiscal year. We also need a firm identifier to get rid of possible duplicate observations.
Each dataset comes with a list of variables. In order to learn the names of the variables in Funda (Fundamentals) : Home -> Support -> Data -> Dataset List, then look for COMP:NA [/wrds/comp/sasdata/naa] and click on 'funda' (Merged Fundamental Annual File)
|Net Income (Loss)||ni|
|Stockholders' Equity - Total||seq|
|Industry code (SIC)||sich|
|Data Year - Fiscal||fyear|
|Global Company Key||gvkey|
To export a dataset, rightclick on a dataset in the explorer-plane while viewing the contents of a library (Note : there are tabs to switch between 'results' and 'explorer' at the bottom of the left pane; use 'explorer' to open the appropriate library). Select 'export' and complete the wizard.
In this example, the results have been exported and further processed (the image below is a screenshot of a Flash movie).
As expected, profitability for financial institutions has been under pressure in 2008. Also, firms in the textiles and printing industry have lower levels of profitability. Accounting principles (expensing of R&D) seem to explain the losses for pharmaceutical firms, since the median firm in this industry is already experiencing substantial losses (as a percentage of equity) in 2006.
|Other Tutorials WRDS|
|WRDS using SAS: Getting started|
|Restatement event study|
|Estimating the earnings response coefficient (ERC)|
|Google market to book and return on assets versus industry|
|Latest forum posts|
|Have trouble with replicating the RESTATEMENT example by sxxapple|
|how to read restatement data from GAO website? by Clark|
|winsor by none? by Clark|
|the tutorial to generate inflation adjusted time series. by pwyw000|
|Restatement Example Regression Results by Zenghui|
|Generate our own erdport1 database from DSF file by Zenghui|
|ROE example by Zenghui|
|SAS SPEDIS function by Zenghui|