Techniques for Accessing ICPSR Holdings on a UNIX System
Accessing ICPSR holdings on a UNIX system entails the following steps:
- Locate the study Data Information Sheet obtained
from the Duke search mechanism
or from the notification you received regarding a new order. This sheet
details the specifics of the files comprising the study including: file
content, the physical file name, the record length (which is particularly
important for data files) and a record count (often equivalent to the
number of observations).
- Login to the UNIX system you will be using. If you are using the ACPUB
system, you must login to godzilla.acpub.duke.edu to
access the archive directory. The archive directory is accessible from
each of the central UNIX compute servers and those used by Sociology
and Economics.
- Confirm the availability of the data by changing to its directory
location:
cd /opt/archive/icpsr/s####
where: #### = the ICPSR study number
- List the files in the directory and compare them against your data
info sheet:
ls -l
- Review the documentation and determine what portions of the study
are of interest. Depending on the complexity of a study, data management
issues can become very involved, so make sure you carefully think through
the issues.
- For each raw data file requiring access, you will need to include
in the program that reads it a file reference statement
that instructs the UNIX operating system to decompress the data on the
fly and pass the decompressed data to the statistical package being
used to extract a workfile. Illustrated below are the techniques used
for SAS. [Details for performing the same operations in SPSS and Stata
will be added at a later date.] The matter is further complicated by
differences in procedure for different UNIX operating systems.
For systems running Tru64 UNIX (Sociology), one simple format applies
in all instances:
filename form1 pipe 'zcat /opt/archive/icpsr/s2939/da2939.form1.gz'
lrecl=1237;
The pipe option instructs SAS t |