Matching Compustat/Crsp with IBES -  using historical Cusip with IBES maybe good for research
Posted: 20 September 2011 09:36 PM   [ Ignore ]
Sr. Member
RankRankRankRank
Total Posts:  169
Joined  2011-09-20

Joost,

Here is what i learned from this forum:
to match Compustat-CRSP: we can use CCM database to match gvkey, lpermno (permon);
to match IBES and CRSP: we can use dsenames database to matach permno to ncusip (crsp.dsenames), then ncusip can be matched with the ticker of IBES(ibes.idsum).
That is great. I love these methods.

My questions is after we match these database, is there any SAS function we can use to verify if our matching is really good? in another words, i wish to compare the company names from different data base. One company may named as ABC Corp in compusta, it maybe name as Copr ABC in IBES. can we use any function to compare these names, and return a true false answer if the matching is done in a good way?

Thanks a lot,

Zenghui

Updated: the SPEDIS function
in your example 2, there is a function called SPEDIS, i guess i can use it to compare how similar are two firm names. However, if a company changes its name, SPEDIS will give wrong alert.
Is this confirmation step is necessary for matching??
Please let me know how do you think.
Thank you!!!

following up question.
Joost,
Is this correct:
gvkey used by compustat of firms are not changing with time.
permno used by crsp not changing with time.
The links between a gvkey and a permno (such as CCM database), is generally unique.
cuisp used by crsp of firms may change with time.
ticker used by ibes may change with time.
Am i correct?
Thank you!!!

Zenghui

update, I read some manual from WRDS, seems above information is correct. thanks.

Joost,
Another dumb question.
I checked IBES data base, it seems the data base NSTATSUM_EPSUS has CUSIP as well as the ticker. (i learned that this CUSIP actually ncusip, the historical cusip).
Is the link between the nCUSIP and ticker unique?

Thanks a lot,

Zenghui
Update: it seems this link is unique, this is what used in example 3.

Update:
I think the CCM link is good enough for my studying purpose for now. Thanks for the programs and ideas.
I learned that: the NCUSIP in Thomson, I/B/E/S, ISSM, TAQ and Option-Metrics is labeled as ‘CUSIP’ (this could be confusing).

 Signature 

Zenghui
A humble student of business

Profile
 
 
Posted: 21 September 2011 09:07 AM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Zenghui,

Thanks for your questions and for looking up that historic cusip (NCUSIP) in Thomson, I/B/E/S, ISSM, TAQ and Option-Metrics is labeled as ‘CUSIP’ . I find matching of the databases very tricky, so all questions in this area are valid. It is not ‘just’ that the different tables have different firm identifiers (gvkey, permno, etc), but also that it is often not very clear if a table uses the current value (‘header’) of an indentifier, or the historic header.
In general, I believe people find the ccm linktable of very high quality. I have understood this table is updated manually and is also rather expensive. Researchers that do not have access to ccm linktable can link Compustat with crsp by ticker symbol and a match on firmname. So, in short, I don’t think it is necessary to do another check on ccm by requiring a match on firm name. (This will only throw out valid data points)

Even though gvkey and permno are unique, they still change due to mergers etc. Run the following code on wrds. It will contain many firms where the gvkey-permno link changes over time (the results includes records where something else changes, like primary link marker).

proc sql;
create table changingPermno as
select * from (
  select *, count(*) as numobs
  from crsp.Ccmxpf_linktable a
  where
  a.linktype in (“LC” “LN” “LU” “LX” “LD” “LS”)
  and a.linkprim IN (“C”, “P”) 
  group by gvkey
  )
where numobs > 1;
quit;

proc sql;
create table changingGvkey as
select * from (
  select *, count(*) as numobs
  from crsp.Ccmxpf_linktable a
  where
  a.linktype in (“LC” “LN” “LU” “LX” “LD” “LS”)
  and a.linkprim IN (“C”, “P”) 
  group by lpermno
  )
where numobs > 1;
quit;

About your last question (is nCUSIP - Ibesticker unique?). The table that holds the link (ibes.idsum) has only a starting date and no end date.
Now that I am thinking about it, maybe it would have been more straightforward to use historic cusip to match with IBES in the example.
Repeating the match with historic cusip and comparing the results with ibes ticker would be a way to make sure no obs are dropped, like this:

PROC SQL;
  create table ibesdata as
  select a.*, b.STATPERS, b.MEANEST
  from getIbes a, ibes.STATSUMU_EPSUS   b
  where
  a.ibes_ticker = b.ticker
  and b.MEASURE=“EPS”
  and b.FISCALP=“QTR”
  and b.FPI = “6”
  and a.rdq - 45 < b.STATPERS < a.rdq -2
  and a.datadate -5 <= b.FPEDATS <= a.datadate +5
;
with a match on cusip instead of ibesticker (“where a.cusip = b.cusip” (historic cusip))
Possibly, this results in a different number of observations. Possibly because of recycling of ibes_ticker (resulting in ‘false’ matches).
In the query on ibes.idsum there is the requirement that “and a.rdq > b.SDATES “. Possibly some obs are lost here, that would not be lost with cusip.

For the purpose of the tutorial I think it is ok to use idsum and ibes_ticker to get IBES info.
For a study/paper, I agree with you that cusip is probably better.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 07 September 2012 01:09 PM   [ Ignore ]   [ # 2 ]
Sr. Member
RankRankRankRank
Total Posts:  169
Joined  2011-09-20

A quick summary on this topic:

Compustat—> gvkey
CRSP—> lpermno
IBES—> NCusip (for a serious study, ticker is not recommended)

Compustat and CRSP are linked by the CCM database.
IBES and CRSP could be linked by the dsenames database. (ncusip)

P.S. The cusip in the compustat is header cusip not the Ncusip. very easy to get confused about these two.

COMPGED function is another function to use in addition to SPEDIS in sas when matching with names/text.

Examples from SAS website:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm

The following example uses the default costs to calculate the generalized edit distance.
options nodate pageno=1 linesize=70 pagesize=60;

data test;
  infile datalines missover;
  input String1 $char8. +1 String2 $char8. +1 Operation $40.;
  GED=compged(string1, string2);
  datalines;
baboon   baboon   match
baXboon baboon   insert
baoon   baboon   delete
baXoon   baboon   replace
baboonX baboon   append
baboo   baboon   truncate
babboon baboon   double
babon   baboon   single
baobon   baboon   swap
bab oon baboon   blank
bab,oon baboon   punctuation
bXaoon   baboon   insert+delete
bXaYoon baboon   insert+replace
bXoon   baboon   delete+replace
Xbaboon baboon   finsert
aboon   baboon   trick question: swap+delete
Xaboon   baboon   freplace
axoon   baboon   fdelete+replace
axoo   baboon   fdelete+replace+truncate
axon   baboon   fdelete+replace+single
baby   baboon   replace+truncate*2
balloon baboon   replace+insert
;

proc print data=test label;
  label GED=‘Generalized Edit Distance’;
  var String1 String2 GED Operation;
run;

The following output shows the results.

Generalized Edit Distance Based on Operation
                  The SAS System                   1

                  Generalized
                    Edit
Obs   String1   String2   Distance   Operation

  1   baboon   baboon       0     match              
  2   baXboon   baboon     100     insert              
  3   baoon   baboon     100     delete              
  4   baXoon   baboon     100     replace            
  5   baboonX   baboon       50     append              
  6   baboo   baboon       10     truncate            
  7   babboon   baboon       20     double              
  8   babon   baboon       20     single              
  9   baobon   baboon       20     swap              
  10   bab oon   baboon       10     blank              
  11   bab,oon   baboon       30     punctuation          
  12   bXaoon   baboon     200     insert+delete        
  13   bXaYoon   baboon     200     insert+replace        
  14   bXoon   baboon     200     delete+replace        
  15   Xbaboon   baboon     200     finsert            
  16   aboon   baboon     200     trick question: swap+delete
  17   Xaboon   baboon     200     freplace            
  18   axoon   baboon     300     fdelete+replace        
  19   axoo     baboon     310     fdelete+replace+truncate  
  20   axon     baboon     320     fdelete+replace+single  
  21   baby     baboon     120     replace+truncate*2      
  22   balloon   baboon     200     replace+insert

 Signature 

Zenghui
A humble student of business

Profile
 
 
Posted: 10 October 2012 03:58 AM   [ Ignore ]   [ # 3 ]
Newbie
Rank
Total Posts:  1
Joined  2012-10-10

Hi everyone,

thanks for the great clarifications. Especially the highlighting of difference in compustat and ibes cusipss is very helpful.

I understand that a linkage ( IBES <-> CRSP <-> Compustat) works well.

Nevertheless, as CRSP data only covers US exchanges I am facing problems to match the international IBES and Compustat files.
Matching via the contained CUSIPs seems to miss a significant number of valid relations.

I have learned that CUSIPs for non-US firms are constructed from the SEDOL, unfortunately this seems to be subject to changes over time in both datasets.

Would be great if someone got a hint for me regarding an alternative to the CRSP/Compustat link which is applicable to match the international datasets of IBES and Compustat.

Many thanks in advance – cheers Alex

Profile
 
 
   
 
 
     Mactching SDC and compustat ››