Downloading from Spain’s Instituto Nacional de Estadística

After the 2013 meeting of the Association for Spanish and Portuguese Historical Studies,  Andrea Davis (then a sharp graduate student) kindly shared with me a list of Spanish digital resources she prepared.  I just (April 22, 2016) finished poking through looking for new items and I saw that the Instituto Nacional de Estadística (INE) had posted the 1930 Anuario. {I will always try to provide a complete and corrected reference: Instituto Nacional de Estadística (Spain). Anuario Estadístico de España. Año XVI 1930. Madrid: Sucesores de Rivadeneyra, 1932.}

http://www.ine.es/inebaseweb/pdfDispacher.do?td=50641&ext=.pdf
Title page 1930 annual

The root of this DH project is an experiment with tools and historical questions that center on Barcelona in the early period of the Second Republic (1931-1933).  A major portion of that is creating a functional dataset from the Padrón Municipal  de 1930.  That the 1930 national statistical volume  now online is incredible, I could have used it on many times in the past… but there are issues.

To start, there is no way to download the entire volume at once.  Instead, you have to wade through a file tree and download each table (I apologize for the poor screenshot).

Screen shot of the 1930 anuario's file tree structure
Screen shot of the 1930 Anuario’s file tree structure

As you can see there are branches on each.   There are seven initially:

 But these branch out into “sub-branches” and often branches there. Ultimately each table is listed and you have to click there to load the separate pdf file. When you click on the link to the actual pdf, a new WINDOW opens (unless you simultaneously hold down the control key to open it in a new tab). This window then gives you the pdf which you have to download — but without a file extension.  So as I downloaded each file I have to add the .pdf to the file name.  The file names do not correspond to anything I can recognize but are a series of numbers and not in numerical order.  The title page (portada in the image at the top of this page) is file 5061.  What appears to be page 17 of the volume is the table labeled “II. Resultados provisionales del Censo de 1930, en las capitales de provincia” of Demografía is file 4362 — 699 digits less.

I did up to the “Agricultura” sub-branch of “Producción, consumo, y cambio” ( Agricultura has the further sub-divisions of “Producción agrícola (35 tablas) / Colonización agrícola (2 tablas) / Producción forestal (18 tablas)  / and Ganadería (9 tablas)). This was when I gave up trying to do it in one setting.  127 files later, after adding .pdf to each, I used Adobe Acrobat Standard to combine them.  The files changed order when I sorted by name so I sorted by time downloaded to combine into a single file, and named the file.

The file name is the name of the volume and currently contains 177 pages. {And now I am too tired to review the combined files to see what is the resulting order.}  I saved it to my desktop so I will see it and remember to continue with the “Industria” sub-branch of “Producción, consumo, y cambio.”  I also pasted the list of branches still to be done using Adobe’s “Document properties” feature (accessible by control-d in the file).

And to be honest, I had to do the combining a second time after I deleted the files (but they were still recoverable) because apparently I did not save the original combined file.

So when I am done — what should I do with the resulting file dear readers?  And done in this case means several things:

  • adding all the files for 1930 from the INE website
  • making a proper table of contents in the Adobe file
  • trying to number the pages so these correspond to the actual printed pages (unlike Oxford’s obnoxious text numbering in Oxford Scholarship Online)
  • OCR
  • Reducing the file size

And a final note, I am sure there is a way to do this that is more technologically adept, by “harvesting” the files.  I need to read Ian Milligan’s post on doing this on another site and take a stab at it myself.

Bienvenidos y buenaventura

This will be the record of my stumbling attempts and musing about my efforts to combine research in digital humanities (history to be more precise) and anarchism in Barcelona during the period before the Civil War.  And on the other events that possess my floating attention.  I promise that it will be less structured than I usually am — and I will refrain from too many comments on the 2016 presidential race in the U.S.A. — that is low lying fruit.

It will include observations on anything that catches my fancy — I will try to emulate Muhammed Ali’s 1964 comment: “Float like a butterfly, sting like a bee” {though allergic, I am very fond of bees– and butterflies}.

css.php