2 of 2
2
help with downloading 10-ks from sec edgar
Posted: 21 November 2014 09:43 AM   [ Ignore ]   [ # 16 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi wrkrbee,

Sounds good, but can you rename test.pl to test_perlcode.txt and upload it again? The forum software does not allow .pl extensions. (I can’t view the file.)

thanks,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 21 November 2014 03:25 PM   [ Ignore ]   [ # 17 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Hi Joost,

My bad, here is the text file version of the pgm (forum will not let me attach the file, 6K, no clue why).

Sorta shifted gears here.

Andy Leone (U of Miami) posted a few PERL programs (https://sbaleone.bus.miami.edu/PERLCOURSE/Perl_Resources.html).

Took his download_filings program, and modified it to capture the file size.

Also modified his OUTFILE to contain only CIK, Filing date, Form type and file size (which is all I need to merge with CRSP and COMPUSTAT).

Pgm runs, no errors or warnings.

Only issue is that all entries seem to have a file size of 0.

See lines 105 and 106 where I use two different methods for computing file size.

Think the method in line 106 needs line 9, but un-commenting line 9 creates errors/warnings.

So, this is where the ground forces call for the big guns.  LOL

Any insight is greatly appreciated.

Thanks!!!!

Profile
 
 
Posted: 21 November 2014 05:19 PM   [ Ignore ]   [ # 18 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi wrkrbee,

Did you attach your code? (I see no attachment). I looked at Andy Leone’s code but lines 105/106 of Download_Filings.pl are comments.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 21 November 2014 06:18 PM   [ Ignore ]   [ # 19 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Hi Joost,

Forum will not let me attach the pgm, gives me the following error:

Error Message:

The file could not be written to disk.

Pgm is only 6K, but still no go.

Profile
 
 
Posted: 21 November 2014 09:47 PM   [ Ignore ]   [ # 20 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Hi Joost,

Only way to get you the code is to zip it, so maybe this will ... maybe not.

Included two data files to test with.

Bottom line here is that the pgm runs but file sizes show zero.

Maybe PERL will not allow file size computations in the midst of a download?

Apologize for the hassle!

Thanks for your time and patience!

File Attachments
0001144204-09-017307.txt  (File Size: 8KB - Downloads: 171)
0001327459-09-000004.txt  (File Size: 10KB - Downloads: 414)
Download_Filings_perlcode.zip  (File Size: 3KB - Downloads: 144)
Profile
 
 
Posted: 22 November 2014 10:38 AM   [ Ignore ]   [ # 21 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi wrkrbee,

On lines 105 and 106 you use the variable $direct

my $filesize = -"$direct";
my $sb = (stat($direct))[7]

I was expecting that $direct is the filename you want the size for. But, $direct is a directory name (it is set once on line 33).

If I understand your code correctie, lines 105 and 106 are in a loop where you go through each line of the edgar history file. I also see code that retrieves the filings through ftp. After ftp download is the place where you can measure filing length (you need to have the filing to determine its length; there is no way to determine filing length if you only have the filing name/url.)

Hope this helps,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 22 November 2014 12:01 PM   [ Ignore ]   [ # 22 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Thanks Joost!

Once I get the file size,  I can save/write only the vars that I want to a file, right?

Something like:  print OUTPUT “$CIK,$form_type,$file_date,$filesize\n”;

Does that sound right?

Thanks again!

Profile
 
 
Posted: 22 November 2014 12:03 PM   [ Ignore ]   [ # 23 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi wrkrbeee,

Yes, that should work.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 22 November 2014 12:35 PM   [ Ignore ]   [ # 24 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Thanks Joost!  Really appreciate your time, attention and patience!  grin

Profile
 
 
Posted: 22 November 2014 12:43 PM   [ Ignore ]   [ # 25 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

sure, no problem! smile

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 24 November 2014 11:58 AM   [ Ignore ]   [ # 26 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Joost ‘da man!

Perl question for file size ...

If I visit the SEC website, use EDGAR to retrieve the 10-K for CIK 0000006201 (AMR Corp) with the file date 20090219,  I see a file size of 6,255,650 bytes (complete submission text file).

When I use the “-s” test operating from PERL, I get a file size of 6,324,458 bytes.

Any thoughts on why PERL is generating a larger file size?

It’s not major, but I can foresee an editor/reviewer just looking for a reason to reject a paper.  LOL

Thanks for any insight!!!

Rick

Profile
 
 
Posted: 24 November 2014 01:14 PM   [ Ignore ]   [ # 27 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Rick,

When I download the filing and store in on disk, the file system gives 6,324,456 bytes. That 2-byte difference may be windows vs linux storage.

I would remove html markup before taking the size. HTML markup easily takes up 50-70% of the filing size. So it is hard to compare filing size between firms when some do and some don’t use HTML.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 08 December 2014 10:35 AM   [ Ignore ]   [ # 28 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Hi Joost,

Figured out how to download 10-Ks and 10-Qs using Andy Leone’s code.

Works, but uses FTP, which can literally takes days to download one year of data.

Was thinking that I could replace the FTP code in my program (attached) with the relevant portion of your code (http://www.wrds.us/index.php/tutorial/view/26).

I believe both programs (yours and mine) essentially build holding files containing the filings we wish to download, then access the SEC website for download.

Any thoughts for replacing the FTP code?

Sorry to bother you!

Thanks!

Rick

File Attachments
Download_Filings_10K_10Q.zip  (File Size: 3KB - Downloads: 132)
Profile
 
 
Posted: 08 December 2014 12:48 PM   [ Ignore ]   [ # 29 ]
Newbie
Rank
Total Posts:  17
Joined  2014-11-16

Hey Joost,

Update: looks like all I really need is the text file with all of the filing URLs to run your program.

Used Andy’s program to generate the URLs ... done!

Changed the filename in your program to match my new filename, saved it in the same directory as the program.

Run the script, works.

Tricky part is saving the text file from Andy’s program in a way that the data are stored internally in a CSV format, but the name of the file has the TXT extension.

Bottom line is it works.

Sorry again to bother you!

Thanks!

Rick

Profile
 
 
Posted: 08 December 2014 07:03 PM   [ Ignore ]   [ # 30 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Rick,

Glad it worked out.

If downloading takes a long time: you can always open multiple instances of the ‘command window’, and run several perl programs at the same time.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
   
2 of 2
2