Multiple Observations
Posted: 24 May 2012 05:13 PM   [ Ignore ]
Newbie
Rank
Total Posts:  7
Joined  2011-10-18

Is there any easy way to collapse two rows if you identify an issue with a database in which for some observations it has been split over two rows?  In those instances I have several variables and it would be nice if there was a quick way to tell SAS to simply sum the variables.  Thanks.

Profile
 
 
Posted: 24 May 2012 05:29 PM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Wciconte,

Assuming you have a firm identifier (say gvkey) and a period (say fyear) and a bunch of variables (at sale ceq etc), then this would work:

/*  Include array function macros (needed for %DO_OVER) -- highly recommended to download and include locally */ 
filename m1 url 'http://www.wrds.us/macros/array_functions.sas';
%include 
m1;

/*  VALUES=... list of variables to be summed */
proc sql;
create table work.summed as 
select gvkeyfyear, %DO_OVER(VALUES=at sale ceq niPHRASE=SUM(?) as ?, BETWEEN=COMMA)
from work.hasDoubles
group by gvkey
fyear;
quit

If there are many variables to sum, you might consider using proc datasets to get the variables and load them into an array with %ARRAY. (see manual of %ARRAY and %DO_OVER at http://www.sascommunity.org/wiki/Tight_Looping_with_Macro_Arrays ). This query also assumes you can sum all the variables; if this is not the case (for example, there are firm identifiers, or character variables), you would first sum the numerical values, and then ‘left join’ the character variables.

Something like this:

/*  Include array function macros (needed for %DO_OVER) -- highly recommended to download and include locally */ 
filename m1 url 'http://www.wrds.us/macros/array_functions.sas';
%include 
m1;

/*  Sum numerical variables */
proc sql;
create table work.summed as 
select gvkeyfyear, %DO_OVER(VALUES=at sale ceq niPHRASE=SUM(?) as ?, BETWEEN=COMMA)
from work.hasDoubles
group by gvkey
fyear;
quit;

/*  Append variables that not need to be summed (LEFT JOIN instead of regular join prevents duplicate rows) */
proc sql;
create table work.summed2 as
select a.*, %DO_OVER(VALUES=cik cusipPHRASE=b.?, BETWEEN=COMMA)
from work.summed a LEFT JOIN work.hasDoubles b
ON a
.gvkey b.gvkey and a.fyear b.fyear;
quit

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile