A question related to finding matches for treatment firms
Posted: 05 April 2013 09:19 PM   [ Ignore ]
Jr. Member
RankRank
Total Posts:  34
Joined  2013-03-31

Hi Joost,

I was wondering whether there is some SAS macro or some suggested efficient way to find matches for treatment firms.

I need the control firms to be in the same size decile, market to book decile, momentum decile as the treatment firms, also same industry and year

After the above constraint, I require the treatment firms and control firms to have the smallest distance of analyst coverage (three matches are required).

The dumb way of finding matching in my mind is the following steps:

1. Assign the whole sample size decile, market to book decile and momentum decile numbers.

2. Construct two tables: treatment table and the Potential control table;

3. Merge the tables by requiring the obs in the treatment table and the potential control table have the same size decile number, market to book decile number, momentum decile number, the same sic2 and the same fiscal year;

4. Keep the observations with the smallest difference of analyst coverage in the merged table (I only know how to keep the smallest one, but do not know how to keep the three smallest one!Could you help me with that?)


Is my way of doing it appropriate? And is there any smarter way of finding the matches by sorting basing on different firm characteristics?

Thank you very much,

Anna

Profile
 
 
Posted: 05 April 2013 11:21 PM   [ Ignore ]   [ # 1 ]
Jr. Member
RankRank
Total Posts:  34
Joined  2013-03-31

There is one additional requirement need to be added:

The sorting is done annually.

Profile
 
 
Posted: 07 April 2013 08:39 AM   [ Ignore ]   [ # 2 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Anna,

Keeping the first three of some characteristic can be done as follows.
First, sort the dataset in the desired order (some sort of measure of how well the match is, for example, larger value is better match - in that case, sort descending).

Then, use a datastep with ‘by firmyear’, where firmyear is a key that is unique for each firmyear (or firmquarter, whatever the level of analysis is). In that datastep, compute a variable that equals _N_, I believe this will reset for each new firmyear.

Something like this:

data two;
set one;
by firmyear;
retain count;
count _N_;
run

If this doesn’t work (didn’t test it) try this:

data two;
set one;
by firmyear;
retain count;
if 
first.firmyear then count 0;
count count 1;
run

I understand your matching procedure, but I expect you will run into trouble because you have too many requirements. (The trouble is: too few matches)

An alternative you may want to consider (or at least know about) is propensity score matching. This is basically a ‘joint’ matching, where not each of the matches needs to be perfect, but ‘on average’ you will be fine.

Steps:
- do a logit regression with the treatment outcome (binary) as the dependent, and all the variables you want to match on as the independents
- compute the fitted values of each observation

Then, require the fitted value of a treated firm to be close (within 0.01 or so) of a control firm. This means that these firms are equally likely to be having the dependent to be 1, given their independents. It turns out, that when you compare basic statistics (mean, median) for both subsamples, they will be very similar. So, individual matches may not be similar, but a sample as a whole will be (or at least, a lot more than the ‘raw’ unmatched subsamples).

This requires some coding though. The tricky part is that you may not want a single observation to be matched multiple times as the best match (you will also have this problem with any other matching procedure).

Good luck!

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 07 April 2013 02:10 PM   [ Ignore ]   [ # 3 ]
Jr. Member
RankRank
Total Posts:  34
Joined  2013-03-31

Hi Joost,

Thank you for your reply! One problem with the propensity score matching is that we use the whole sample period to estimate the propensity score model, and try to find matches with the nearest distance (say, we are using nearest neighbor propensity score matching). We can not guarantee that the matches are from the same industry and same year as the treatment group, even though we can put year fixed effect and industry fixed effect in the logistic or probit model. I could do a propensity score matching, but I think I should also do a matching using the traditional sorting method. By using this classical method of matching, I can make sure that the controls and the treatment are from the same industry and the same year.


Is the purpose of the following step assigning an ID number to each observation? (For the convenience of later matching?)
data two;
set one;
by firmyear;
retain count;
count = _N_;
run;
If this doesn’t work (didn’t test it) try this:

data two;
set one;
by firmyear;
retain count;
if first.firmyear then count = 0;
count = count + 1;
run;

Best regards,

Anna

Profile
 
 
Posted: 07 April 2013 03:52 PM   [ Ignore ]   [ # 4 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Anna,

Yes, the ‘count’ in the above code can be used to limit the dataset so that only the three best matches are included.

About PSM; you can run your logit model with or without industry/year controls. If you want matched firms to be in the same industry-year, or maybe some other ‘hard’ condition, you just put that in addition to the requirement that the fitted value needs to be similar.

Of course you can do ‘traditional’ matching (or maybe both), but I think you will lose many observations with the ‘traditional’ matching. Typically ‘older’ literature does this, and you hardly see papers that match on more than 2 dimensions (typically, industry, year and size, or maybe profitability).

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 07 April 2013 04:32 PM   [ Ignore ]   [ # 5 ]
Jr. Member
RankRank
Total Posts:  34
Joined  2013-03-31

Hi Joost,

Thank you so much for your reply! Now I feel more clear!

I think that your reply “combining the estimated score and other hard conditions” makes a lot of sense.

Another related question: do you think I should run the logit model with or without industry/year controls if I have already decided to add the hard matching conditions of “Year” and “Industry”? (It does not seem to be necessary to include the year/industry controls in the logit model if hard controls will be added later. But I do not know whether it will harm if I include the industry/year controls in the logit model when I already have the “Hard” constrains.)

Anna

Profile
 
 
Posted: 07 April 2013 06:02 PM   [ Ignore ]   [ # 6 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Anna,

You can always do both and see if it matters. I would think including year/industry in the logit model would make the model better. Then, you can still add them as hard controls to ensure that there really can’t be any difference in year/industry between the two samples.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 07 April 2013 09:17 PM   [ Ignore ]   [ # 7 ]
Jr. Member
RankRank
Total Posts:  34
Joined  2013-03-31

Thank you a lot for your comments, Joost!

All the best,

Anna

Profile