FTP EDGAR Exhibits
Posted: 30 March 2016 12:55 PM   [ Ignore ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

HI Joost, I’m looking to download Exhibit 21 (which is a static/invariable SEC reference) from 10-K filings stored on EDGAR.  Instead of retrieving the entire 10-K text file, EDGAR stores this exhibit separately as “ex21.htm”  For example, see 2014 10-K filing for CIK 1141807. One problem that “I think” lies in wait is that the SEC index files only refer to text files, is that right?  If so, would you have any suggestions other than retrieving the entire 10-K text file? Thanks for any insight you may have!

Profile
 
 
Posted: 30 March 2016 06:47 PM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

The full 10-Ks don’t take up that much space, and exhibit 21 is clearly marked within the full filing (very easy to extract); I have no experience with pulling out separate/additional files, but if it is named consistently, pulling them in like that (since files) would work as well.

Hope this helps,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 31 March 2016 12:52 PM   [ Ignore ]   [ # 2 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Hi Joost, how would you extract the exhibit, REGEX? Finding the beginning and end might be dicey, then strip the white space?

Profile
 
 
Posted: 02 April 2016 04:17 PM   [ Ignore ]   [ # 3 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

I forgot the exact details, but I remember identifying the beginning and ending is super-easy/trivial. Take a look, it is something with <DOCUMENT> and </DOCUMENT>, and with some identifier for exhibit 22. You won’t need a regular expression, just match against a string.

Best,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile