I have already downloaded the xbrl index files. These files exist for each quarter and contain the references required for finding the xbrl zip file for each company. Companies are identified by name and a CIK number. The CIK number appears to be a SEC company id number.
I have written a tool which reads the index files going back quarter by quarter to 2005. For each idx file the tool will find the remote file and download it using ftp. It is running as I write this downloading every sort of filing there is which has been done using xbrl.
I estimate about 240,000 files and at a rate of about 1200 files per hour it will only take 200 hours!
Edit: 24 hours in and I have 36,000 files downloaded. This represents about a years worth of data.
Edit: 72 hours later and I have downloaded a total of about 70GB of data.
I will need to implement an incremental download which will download the new filings as and when they occur or perhaps on a monthly basis.