Best way to replicate ‘entropy measure of diversification’ - segments
Posted: 16 October 2013 09:00 AM   [ Ignore ]
Newbie
Rank
Total Posts:  3
Joined  2013-10-16

Dear forummembers,

Part of my research is based around the entropy measure of diversification of Jacquemin and Berry (1976) and used by Wiersema and Bantel (1992).

This measure is: Pi * LN(1/Pi) (aggregated for all segments) with Pi being the percentage of sales in the ith segment.

I’d like to get the measures of Fortune 500 companies for years 2002-2011 but I am having difficulties getting the exact and good data.
Could someone help me with an explanation for getting this data out of the Compustat historical segments database?

I have tried so far using the CUSIP codes from an uploaded file (for some reason individual lookup doesnt work) and selecting the ‘operating segments’ variable and the ‘total revenue’ and ‘total sales’ variable.
This gets me a list of operating segments with the firm’s sales and revenue (which is mostly the same). However, when I add up all the sales and/or revenue (seperately) it does not add up to revenues published by the firms itself (SEC/Annual Reports).

What am I doing wrong and how could i improve this.


Thank you in advance,

Jdijkers


ps: I am using the webforms as I am not familiar with SAS.


Jacquemin, A. P., & Berry, C. H. (1979). Entropy measure of diversification and corporate growth. The Journal of Industrial Economics, 27(4), 359-369.

Wiersema, M. F., & Bantel, K. A. (1992). Top management team demography and corporate strategic change. Academy of Management journal, 35(1), 91-121.

Profile
 
 
Posted: 17 October 2013 06:53 AM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Jdijkers,

I have a SAS macro to compute the entropy measure. Note that the forum engine adds a semicolon behind every macro variable. I am adding the macro as a attachment as well.

The macro uses a dataset ‘segments.bussegments’ which is a subsample of the historical dataset online. Please let me know if you find irregularities/mistakes.

Constructing the ‘segments.bussegments’.

The requirement that srcdate eq datadate is to prevent duplicates. For example, segment info on year x will be included in the annual reports of year x, year x+1 and year x+2. I drop GEO segments.

libname segments "C:\wrds_libs\comp_segments\sasdata";

data segments.segments_short (keep GVKEY srcdate datadate STYPE SID IAS CAPXS NAICS NAICSH NAICSS1 NAICSS2 NIS OPS SALES SICS1 SICS2 SNMS SOPTP1 INTSEG);
set segments.Wrds_segmerged;
if 
srcdate eq datadate;
run;

*
455253 segments;
data segments.busSegments;
set segments.segments_short;
if 
stype IN ("BUSSEG""OPSEG");
run


The macro

/*
entropy and herfindahl diversification measures 

Pi = percentage assets/sales of total sales in industry i
Herf = sum (Pi x Pi)
Total Entropy = sum (Pi x Pi x ln (1/Pi)  -- 4 digit SIC [6 digit NAICS]
Entropy unrelated = like total entropy, using 2 digit SIC [4 digit NAICS]

NOTE: there is another version of this file with _SIC that only uses SIC codes

*/

/* macro vars 

 dsin:  dataset in
 dsout:   dataset out
 entropy: 'total' or 'unrelated'
     total entropy uses 4-digit SIC  [6 digit NAICS]
     unrelated entropy uses 2-digit SIC  [4 digit NAICS]
 segm:  NAICS or SIC
*/
%macro entropy(dsin=, dsout=, entropy=totalsegm=NAICS );

%if &
entropyeq total %then %let sicLevel sic;
%if &
entropyeq unrelated %then %let sicLevel sic_2;

data work.e_1 (keep firmyear gvkey datadate assets sales sic sic_2);
set segments.bussegments;
firmyear gvkey || datadate;
assets IAS;

/* use NAICS of SIC? */
%if &segmeq NAICS %then %do;
 
sic NAICSS1;
 
sic_2 substr(NAICSS114);
%
end;
%else %do;

 
sic SICS1;
 
sic_2 substr(SICS112);
%
end;

if 
SIC ne .;
run;

/* group by 4-digit sic */

proc sql;
 
create table work.e_2 as select firmyeargvkeydatadate, &sicLevel;, sum(assets) as assetssum(sales) as sales from work.e_1 group by firmyear, &sicLevel;
quit;

/* totals firm level */

proc sql;
 
create table work.e_3 as select firmyearsum(assets) as assetssum(sales) as sales from work.e_2 group by firmyear;
quit;

/* merge back */

proc sql;
 
create table work.e_4 as
 
select a.*, b.assets as assets_firmb.sales as sales_firm
 from
  work
.e_2 a
 LEFT JOIN
  work
.e_3 b
 ON 
  a
.firmyear b.firmyear;
quit;

data work.e_5;
set work.e_4;
assets /assets_firm;
sales sales_firm;
if 
p eq then p 0;
if 
q eq then q 0;
pp pp;
qq q;
if 
0 then pp_lnp pp log(1/p);
if 
0 then qq_lnq qq log(1/q);
run;

proc sql;
 
create table work.e_6 as 
  
select distinct firmyeargvkeydatadate
   
sum(pp) as d_herf_assetssum(qq) as d_herf_sales
   
sum(pp_lnp) as d_entr_&entropy;._assetssum(qq_lnq) as d_entr_&entropy;._sales
  from work
.e_5
  group by firmyear
;
quit;

proc sql;

 
create table &dsout; as
  
select a.*, b.d_herf_assetsb.d_herf_salesb.d_entr_&entropy;._assetsd_entr_&entropy;._sales
 from 
  
&dsina
 LEFT JOIN
  work
.e_6 b
 ON
  a
.gvkey b.gvkey and a.datadate b.datadate;
quit;

/* assume missing obs to be single segment firms
 herf = 1
 entropy = 0
*/
data &dsout;
set &dsout;
if 
d_herf_assets eq then d_herf_assets 1;
if 
d_herf_sales eq then d_herf_sales 1;
if 
d_entr_&entropy;._assets eq then d_entr_&entropy;._assets 0;
if 
d_entr_&entropy;._sales eq then d_entr_&entropy;._sales 0;
run;

%
mend

Hope this helps, best regards,

Joost

File Attachments
entropy_measure.txt  (File Size: 3KB - Downloads: 470)
 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 17 October 2013 11:04 AM   [ Ignore ]   [ # 2 ]
Newbie
Rank
Total Posts:  3
Joined  2013-10-16

Thank you very much.

As I am not at all familiar with SAS I hope I can use it. Ill try to find some kind of tutorial and hope my university has a proper license to use it.

Best, Jdijkers

Profile
 
 
Posted: 17 October 2013 03:56 PM   [ Ignore ]   [ # 3 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

I hope it works out; you could use the web form to download the data and do these steps in Excel/Stata as well. (probably not as much fun as it would be with SAS smile)

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 19 April 2015 07:10 AM   [ Ignore ]   [ # 4 ]
Newbie
Rank
Total Posts:  26
Joined  2014-05-03

Dear Joost,
  I use your SAS macro code. Could you kindly inform what is the difference between total entropy and unrelated entropy? The only difference seems total entropy use four-digit whereas unrelated entropy use two-digit SIC. Right?
  How to calcualte related entroy then?
Best,
Xinjiao

Profile
 
 
Posted: 19 April 2015 07:59 AM   [ Ignore ]   [ # 5 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Xinjiao,

The entropy measure is a Herfindahl-like measure to capture diversification. If the 4-digit sic codes are different but similar (like 1234 vs 1235), then it would introduce measurement error (the firm would not really operate in different industries). The ‘unrelated’ entropy would address that (where industry codes are considered different when the 2-digit SIC is different).

I’m not sure I understand what a ‘related’ entropy measure would need to capture (does it exist?).

Best,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 19 April 2015 08:05 PM   [ Ignore ]   [ # 6 ]
Newbie
Rank
Total Posts:  26
Joined  2014-05-03

Dear Joost,
    You mean that 2-digit, “unrelated” entropy is a better proxy for diversification, than the 4-digit, total entropy. Right? As diversification consists of related diversification and unrelated diversification, I thought your “unrelated” entropy here is to capture unrelated diversification.So I want to ask how to related diversification. Very appreciated that, Joost.

Best,
Xinjiao

Profile
 
 
Posted: 20 April 2015 06:27 AM   [ Ignore ]   [ # 7 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Xinjiao,

Yes, that makes sense (grouping similar segments to get a better sense of diversification across industries). The opposite (throwing away ‘different’ industries) to only use related segments will not work well (at least, will not measure diversification).

Best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile