Category: Data management

Author resource: Joost Impink

Explanation

This macro counts the number of segments (industry segments, geographic segments) in Compustat Segment file. The macro requires a source datasource with gvkey, fyear and datadate variables (for example Compustat Fundamental Annual). The macro allows options to only count unique segments (for industry segments) and an option to ignore segments with an empty segment industry code.

Note: this macro has not been extensively tested, so please proceed with caution.

/*  
	Macro: 			nSegments.sas
	Created by:		Joost Impink
	Date: 			March 2014

	Macro to count the number of segments (either industry segments or geo segments)
 
    Variables required:
        dsin            dataset in (needs to have gkvey, fyear and datadate)
        dsout           dataset out
		type			'ind' (default) or 'geo'
		unique			'true' (default) or 'false', count unique segments only 	
						(segments with the same industry count as 1 segment)
						only relevant for industry segments		
		sicsreq			'true' (default) or 'false'; if true: require industry sic code (sics)
						no be nonempty 	
        var				name of variable to create        
		segmerged		location (libname + dataset name) of wrds_seg_merged 
						e.g. segments.wrds_segmerged	

	Dependencies:
		- the macro requires local access to wrds_segmerged (Compustat) in a local libray
		- %runquit macro
     	 %macro runquit;                
            ; run; quit;                           
            %if &syserr. ne 0 %then %do;                  
            %abort cancel ;                   
            %end;                      
       	 %mend runquit;   
*/

%macro nSegments(dsin=, dsout=, type="ind", unique="true", sicsreq="true", var=, segmerged=);

	/* 	Vars needed from dsin */
	data seg1 (keep = gvkey fyear datadate);
	set &dsin;
	%runquit;

	/*	Select segments segments */
	proc sql;
		create table seg2 as
		select 
			a.gvkey, a.fyear, b.SICS1 as sics
		from
			seg1 a,
			segments.wrds_segmerged b
		where
			a.gvkey = b.gvkey
			/* use segment info of year of 10-K (not later years) */
			and a.datadate = b.datadate
			and a.datadate = b.srcdate
			/* operating or business segments */	
			%if &type eq "ind" %then %do;
				and b.stype IN ("BUSSEG", "OPSEG")  
	  		%end;
			/* geo segments */
	  		%else %do;
				and b.stype = "GEOSEG"		
	  		%end;			
			/* drop segments with missing sics1 ? */
			%if &sicsreq eq "true" %then %do;
				and b.SICS1 ne ""  
	  		%end;
			and b.SALES ne .
			and b.SALES > 0
			;	
	%runquit;

	/*  Unique industries? (only relevant for type "ind")*/
	%if &type eq "ind" and &unique eq "true" %then %do;

	/*	Drop segments with same industry */
	proc sort data=seg2 nodupkey; by gvkey fyear sics;%runquit;
	%end;

	/*	Count #segments  */
	proc sql;
		create table seg3 as
		select distinct gvkey, fyear, count(*) as numSegs from seg2
		group by gvkey, fyear
	;
	%runquit;

	/*	Create output dataset */
	proc sql;
		create table &dsout as
		select a.*, b.numSegs as &var
		from
			&dsin a
		LEFT JOIN
			seg3 b
		ON
			a.gvkey = b.gvkey
		and a.fyear = b.fyear;		
	%runquit;

	/*	Cleanup */
	proc datasets library=work;
	  delete seg1 - seg3 ;
	%runquit;
%mend;

Usage

Sample usage:


data work.sample1 (keep = gvkey fyear datadate conm);
set comp.funda;
/* general filter to drop doubles from Compustat Funda */
if indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C' ;
/* limit to firms with more than $20 mln sales and fiscal years 2010-2012 */
if sale > 20;
if 2010 <= fyear <= 2012;
run;

/* 	wrds_segmerged can be downloaded from wrds at /wrds/comp/sasdata/seghist
 	variable descriptions: 
		http://wrds-web.wharton.upenn.edu/wrds/tools/variable.cfm?library_id=7&file_id=65809
*/
%LET segm = segments.wrds_segmerged; /* location of Compustat segment file */ 

/* invoke macro */

/* industry segments (default type, unique and sicsreq), require non-empty and unique segment industry code (sics) */
%nSegments(dsin=sample1, dsout=sample2, var=numIndUnique, segmerged=&segm);

/* as above, but segments can have same industry code*/
%nSegments(dsin=sample2, dsout=sample3, unique="false", sicsreq="true", var=numInd, segmerged=&segm);

/* as above, but segments can have missing industry code */
%nSegments(dsin=sample3, dsout=sample4, unique="false", sicsreq="false", var=numIndInclMiss, segmerged=&segm);

/* geo segments */
%nSegments(dsin=sample4, dsout=sample5, type="geo", unique="false", sicsreq="false", var=numGeo, segmerged=&segm);

Other Data management
SAS Winsorize macro

forum

Latest forum posts
Fama French 49 YES 49 !!! by CA Miller
date format by Emily
Discretionary Accruals by jwhi121
question about discretionary accrual models by kerrida
Combining Global and North American data by CA Miller
File for Eventus or SAS Event Study by samme
quarterly accruals by lxt88
Financial Statement Comparability - De Franco, Kothari and Verdi (2011, JAR) Replication by rowing
question about discretionary accrual models by Zenghui
problem in perl by KZ

All rights reserved. © 2010-2014 wrds.us [Copyright] [Privacy Statement] [Disclaimer] [About]