The way to get lags and leads
Posted: 14 August 2012 09:51 AM   [ Ignore ]
Sr. Member
RankRankRankRank
Total Posts:  169
Joined  2011-09-20

Just want to share how to get leads and lags in SAS with everybody on this forum.
smile


1. function lag()
very limited function, can’t combine with by group.

2. use SQL select the lags and leads values to join the same table (credits go to Joost).

3. PROC expand, example code as the following
(Credits go to http://support.sas.com/resources/papers/proceedings10/093-2010.pdf)

proc expand data=small_funda method=none out = new_funda;
by gvkey;
convert sale = sale_1 / transformout = (lag 1);
convert sale = _2_sale / transformout = (lead 2);
quit;

* we can change the lag number or lead number easily to (lag 5) or (lead 3) as we need.

 Signature 

Zenghui
A humble student of business

Profile
 
 
Posted: 14 August 2012 10:08 AM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Zenghui,

Thanks - wasn’t aware of the 3rd option (proc expand). Looks like proc expand will bite dust (just like lag()) if a year is not included smile (in that case, for example a lag of 1 will be the value of 2 years ago)

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 14 August 2012 10:20 AM   [ Ignore ]   [ # 2 ]
Sr. Member
RankRankRankRank
Total Posts:  169
Joined  2011-09-20

Joost,

Thanks for pointing out that.
It is very important to keep the data generated precise !

smile

 Signature 

Zenghui
A humble student of business

Profile
 
 
Posted: 17 August 2012 09:48 PM   [ Ignore ]   [ # 3 ]
Newbie
Rank
Total Posts:  8
Joined  2012-08-15

Here’s how to get lags in the presence of BY groups:

data want;
  set msf;  ** say the CRSP monthly stock file **;
  by permno;
  prior_prc=lag(prc);
  if first.permno then prior_prc=.;
run;

** Now what if you want lags of more than one period? There are many ways, but this might be termed the “classic” SAS method (before DOW loop was popularized). 

In it you create an “order” variable which is set to 1 for the first record of each PERMNO, then 2, 3, etc.  Once you know the record-in-hand has order of (say) 2 then you know you must reset all lags beyond one period to missing (.).  In this case, you have to RETAIN the order variable - otherwise SAS resets it to missing at the top of each iteration of the DATA step **;

data want (drop=I);
    set msf;
    by permno;
    retain order;
    if first.permno then order=1;
    else order=order+1;

    prior1=lag1(prc);
    prior2=lag2(prc);
    prior3=lag3(prc);
    prior4=lag4(prc);

  ** Now if we’re at or near the beginning of a PERMNO group, reset lags to missing **;
  array priors {4} prior1-prior4;
  if order<=4 then do I=order to 4;
    priors{I}=.;
  end;
run;


BTW, regarding the LAG function, be carefull about using it in an IF condition.  Do NOT assume that “if ... then x=lag(x)” will examine the IF clause and then get the X from the prior record when the IF condition is meant.  It won’t.  It will get the X from the prior time that the IF condition was filled, which might be several obserations previously.  That’s why this program ALWAYS creates the lag first, then resets to missing when appropriate.

Profile
 
 
Posted: 16 June 2014 05:56 AM   [ Ignore ]   [ # 4 ]
Newbie
Rank
Total Posts:  6
Joined  2014-06-16

Hello,

Sorry I am new to SAS but am having issues working out the code for what I need.

I have a large dataset (Initialtesting), and one of the variables I have is yearly sales revenue (label=sale). I need to come up with a lag measure for this variable, (sales from the previous year). I also need to come up with (sales from the year before that).

The difficulties arise because:

* I have multiple organisations (label=comn) over multiple periods (end of financial year information, label=fyear) , so I need SAS to take this into account.

*I have missing data, and I want SAS to skip where there is missing information.


Is this possible, and can anyone give me a code?

Profile
 
 
Posted: 16 June 2014 08:08 AM   [ Ignore ]   [ # 5 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

You can use the ‘data step’ to do this with the ‘lag’ function. My preferred way of doing this is with proc sql, especially if you need to go back multiple years.

The ‘trick’ is to do a join with the table itself, like so:

proc sql;
 
create table newTable as 
/* a.*: select all variable you already have (appending variables) */
 
select a.* , 
/* give unique names; fyear_prev is for debugging  (easy to make mistakes with the a.fyear-1 at the end*/
  
b.sales as sales_prevb.fyear as fyear_prev
from
  yourtable a
left join 
  yourtable b
on
/* same firm (obiously) */
 
a.gvkey b.gvkey 
/* the data from b will be 1 year before the data in a*/
and a.fyear-b.fyear;
quit

Where gvkey is a firm identifier, and fyear is the year. I used ‘left join’ so to keep observations for which the previous year’s data is missing. Repeat the same proc with ‘and a.fyear - 2 = b.fyear’ to append the sales for 2 years ago.

Make sure that each gvkey-fyear is unique. Also, give the variables unique names (you are appending variables to the original table, if you give the same name as an existing variable, it won’t work as planned).

Hope this helps,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 16 June 2014 10:23 AM   [ Ignore ]   [ # 6 ]
Newbie
Rank
Total Posts:  8
Joined  2012-08-15

I find using PROC SQL to be far slower than the data step.  After all PROC SQL must compare every pair of records to select the matched cases.

But if the data are sorted, then the DATA step (with a LAG function), need only compare consecutive observations - that’s N comparisons instead of N-squared.  The bigger the dataset, the bigger this disparity will be.

Let’s say your data is annual, and is sorted by GVKEY/FYEAR.  Put aside for the moment the possibilty that companies can change fiscal year’s (so there can be two records with the same FYEAR).  The following code gets lagged values (1-year and 2-year lag), but leaves missing values when there is a missing year.

data want;
   
set have;
   
by gvkey fyear;

   
sales1=ifn(gvkey=lag(gvkey) and year=lag(year)+1,lag(sales),.);  /* SALES from prior record, IFF it has the same GVKEY and is exactly 1 year old */

   
sales2=ifn(gvkey=lag2(gvkey) and year=lag2(year)+2,lag2(sales),.);
run

And how do you handle a company changing fiscal years - resulting in two records for a given GVKEY/FYEAR?  Just make a policy on whether you want the first, or last record for that FYEAR.  If you want the last such record, then right after by “BY GVKEY FYEAR” statement, add this one:

if last.fyear 


Take a look at the Leads and Lags in SAS paper presented at last fall’s NorthEast SAS Users’ Group conference (http://www.lexjansen.com/\/cgi-bin/xsl_transform.php?x=sbd&c=nesug)  for more extenisive discussion.

Profile
 
 
Posted: 16 June 2014 08:07 PM   [ Ignore ]   [ # 7 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Sassafras,

Thanks - pretty compact code using the data step; very nice! smile

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 16 June 2014 09:14 PM   [ Ignore ]   [ # 8 ]
Newbie
Rank
Total Posts:  6
Joined  2014-06-16
sassafras - 16 June 2014 10:23 AM

I find using PROC SQL to be far slower than the data step.  After all PROC SQL must compare every pair of records to select the matched cases.

But if the data are sorted, then the DATA step (with a LAG function), need only compare consecutive observations - that’s N comparisons instead of N-squared.  The bigger the dataset, the bigger this disparity will be.

Let’s say your data is annual, and is sorted by GVKEY/FYEAR.  Put aside for the moment the possibilty that companies can change fiscal year’s (so there can be two records with the same FYEAR).  The following code gets lagged values (1-year and 2-year lag), but leaves missing values when there is a missing year.

data want;
   
set have;
   
by gvkey fyear;

   
sales1=ifn(gvkey=lag(gvkey) and year=lag(year)+1,lag(sales),.);  /* SALES from prior record, IFF it has the same GVKEY and is exactly 1 year old */

   
sales2=ifn(gvkey=lag2(gvkey) and year=lag2(year)+2,lag2(sales),.);

run

And how do you handle a company changing fiscal years - resulting in two records for a given GVKEY/FYEAR?  Just make a policy on whether you want the first, or last record for that FYEAR.  If you want the last such record, then right after by “BY GVKEY FYEAR” statement, add this one:

if last.fyear 


Take a look at the Leads and Lags in SAS paper presented at last fall’s NorthEast SAS Users’ Group conference (http://www.lexjansen.com/\/cgi-bin/xsl_transform.php?x=sbd&c=nesug)  for more extenisive discussion.

 


Sorry, but what is gvkey? Is that just the name of the company name variable?

Thank you all for your help so far

Profile
 
 
Posted: 17 June 2014 05:58 AM   [ Ignore ]   [ # 9 ]
Newbie
Rank
Total Posts:  8
Joined  2012-08-15

GVKEY is the company identifier used by Compustat.  I believe a previous owner/distributor of this data was named “Global Vantage” - the “GV” in gvkey.

Profile
 
 
Posted: 17 June 2014 07:58 PM   [ Ignore ]   [ # 10 ]
Newbie
Rank
Total Posts:  6
Joined  2014-06-16

Thank you for all your help, it worked like a charm.

Just a follow up question. I was warned that there were similar issues with using diff functions as lag functions.

If I am using the same dataset, and want to calculate differences between years, e.g. difference in sales year to year, considering the difference firms (e.g. using gvkey) and time periods, how would I go about this?


Thank you very much for all your help so far!!

Profile
 
 
Posted: 18 June 2014 07:15 AM   [ Ignore ]   [ # 11 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

Glad it worked smile

Indeed with getting the lagged (or lead) variables, you need to take care it is for the same firm. But, once you have these variables, there are no issues/complications for computing the difference, for example: sales_ch = sales - sales_prev (where sales is sales for the current year, and sales prev is lagged sales).

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 18 June 2014 07:48 AM   [ Ignore ]   [ # 12 ]
Newbie
Rank
Total Posts:  6
Joined  2014-06-16
Joost Impink - 18 June 2014 07:15 AM

hi,

Glad it worked smile

Indeed with getting the lagged (or lead) variables, you need to take care it is for the same firm. But, once you have these variables, there are no issues/complications for computing the difference, for example: sales_ch = sales - sales_prev (where sales is sales for the current year, and sales prev is lagged sales).

best regards,

Joost

Hey thank you. However, I was wondering how to calculate differences between variables over years. E.G. one of my variables is total assets. How do I calculate differences between years, considering I have multiple firms. Can it be done without lagging and the. Subtracting because I don’t need the lagged variables themselves.

Sorry to keep bugging you. I am still getting the hang of this

Profile
 
 
Posted: 18 June 2014 08:08 AM   [ Ignore ]   [ # 13 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

sure, no problem. This is probably helpful for others as well.

The compute the change in assets, you need:
- retrieve gvkey, fyear, and at (assets)
- to get lagged assets (either use proc sql, or the data step as pointed out by sassafras)
- subtract lagged assets from current assets

You could combine steps 2 and 3 in a single step (which makes it a bit harder to read). You can also drop the lagged variables once you are done with them.

proc sql;
 
create table newTable as 
 
select a.* , 
/* table a is current year, table b is lagged year, so a.at - b.at is change in assets */
 
(a.at b.at) as at_ch
from
  yourtable a
left join 
  yourtable b
on
 a
.gvkey b.gvkey and a.fyear-b.fyear;
quit

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile