7 of 7
7
SEC Filings on EDGAR SAS File
Posted: 16 July 2015 09:26 AM   [ Ignore ]   [ # 91 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Maisha,

There are some things going on. Look at these lines in perl:

#CIK, filename, blank is not used (included because it will capture the newline)
 
($CIK$get_file$blank) = split (","$line); 

You need the have the file with 10-K info (10Ks_sample.txt) to be consistent with this format: a number, the url, and a 0, seperated by commas. (By the way, the number is a unique number, and not the CIK even though the variable name for this number in the perl script is CIK. The 0 is for convenience, as the last variable on a line has a trailing newline, which would need to be removed if the last variable was the url.)

When things don’t work straight away, try first a few records, and print debugging info on the screen. For example add a print like

$get_file "http://www.sec.gov/Archives/" $get_file;
print 
"Downloading url $get_file \n"

You can then inspect if perl is getting the right page.

Best regards,

Joost

 

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 17 July 2015 12:51 PM   [ Ignore ]   [ # 92 ]
Newbie
Rank
Total Posts:  4
Joined  2015-07-11

Thank you Joost, now the files are downloading!
Hope that’s not too much to ask, but I noticed that during the Stat/Transfer process the transfer seemed to get interrupted and start over. I was wondering if the final number of files I got in Stata is the same as the one in the original SAS zip file. Do you happen to know if that initial file had: 14 385 530 total observations, corresponding to 535 754 unique firms? If that is too troublesome for you, please ignore the question.
Thank you once again for you super helpful code and answers!

Profile
 
 
Posted: 18 July 2015 08:45 AM   [ Ignore ]   [ # 93 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Maisha,

No problem, I will check it (may take a few days).

Best,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 20 July 2015 02:01 PM   [ Ignore ]   [ # 94 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Maisha,

Yes, I also get 14,385,530 records for 535,754 unique CIKs (current dataset with records through 2014).

Best,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 22 July 2015 01:52 PM   [ Ignore ]   [ # 95 ]
Newbie
Rank
Total Posts:  4
Joined  2015-07-11

Thanks so much Joost! I’m really grateful for your help, it’s so useful!
Can I ask one more question, to you or any of those who posted as well about the next steps after downloading the files: now that I have the 10Ks, I need to extract the names of the CEO and CFO for each firm. As I read in many other comments, this information is mixed with plenty of HTML fluff in (some of the) files and there doesn’t seem to be a specific section where this information appears for all firms. If anybody had to deal with the same issue and found a solution to it, I’d be extremely grateful! Thanks a lot!

Profile
 
 
Posted: 22 July 2015 04:42 PM   [ Ignore ]   [ # 96 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Maisha,

Yes, the HTML is annoying, and it is generally hard to ‘locate’ information using pattern matching.

Did you check if the data you need (CEO/CFO names) are in ExecuComp? I would expect that to be in there.

Best,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 24 August 2015 11:47 PM   [ Ignore ]   [ # 97 ]
Newbie
Rank
Total Posts:  16
Joined  2013-03-17

Hi guys,

I am using PERL script to download all 10-K files from 1997 to 2013. However, the download speed is too slow. It usually takes several hours to complete one year. I am wondering is it normal or is there a way to speed this up?

Moreover, I find that for one firm-year observation, there are duplicates of 10-Ks. I am a little confused with this, I have checked some and it seems that they are identical. I am wondering is there a way to only download one 10-K for each firm-year observation?

Thank you very much!!!

Best Regards,

Stupidstudent

Profile
 
 
Posted: 25 August 2015 02:48 PM   [ Ignore ]   [ # 98 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

Hard to tell if it is slow or not; it would depend on the capacity of your internet connection, and how much of that is used up by the perl script. You could split the file with downloadIds in 2 files, and run perl in 2 separate consoles. If that helps, you could keep adding consoles until total download speed flattens (make sure each they don’t download the same). But, if the ‘slow’ download takes a few days, it may not be worth the effort to download in parallel.

The duplicate filings have different urls, right? If that is the case there is no way of knowing up front that they are identical. Once you have parsed the filing (and retrieved end-of-year date), you could only use the first 10-K filed for a given fiscal year.

Best Regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 25 August 2015 02:53 PM   [ Ignore ]   [ # 99 ]
Newbie
Rank
Total Posts:  16
Joined  2013-03-17

Hi Joost,

Thanks much for your reply! I will try the parallel download process and check it will speed up.

For the duplicate filings, yes they have different urls and it seems that I have to download all of them.

Thanks much again!

Best Regards,

Stupidstudent

Profile
 
 
   
7 of 7
7