Organization of ICPSR Holdings
The organization of local data holdings addresses several needs:
- It provides a central storage location with distributed access. Social
scientists typically access these data from UNIX systems in Arts and
Sciences. However, anyone at Duke can access the holdings through their
ACPUB account by login to gateway host godzilla.acpub.duke.edu.
- As local holdings accumulate through various requests, a growing body
of data are becoming available, the content of which is user driven.
- The use of ICPSR data on UNIX systems does not require the user to
copy the often large archival files into personal directory space.
- The archives are directly accessed on UNIX systems, so the researcher
need only to be concerned about space for the workfile extracts.
The machine-readable files distributed by ICPSR fall into three broad
categories:
- Raw Data - consisting of alphanumeric text files
stored in compressed format. With compression the data occupy the minimal
possible disk space. (With most compressed files there is a savings
on space of 80% or more.) On a UNIX system the data can be decompressed
and piped on the fly to several different statistical packages. Data
are rarely archived in the system file format of a statistical package
because such files are several orders of magnitude larger than compressed
raw data.
- Statistical Package Control Statements - many studies
now include sets of SAS or SPSS control statements used for reading
in the raw data, defining variable names, value labels and missing values.
When available, the user should copy the control statements to their
directory and modify them into the program that extracts the desired
subset from the archival data.
- Documentation - most commonly in codebook format.
For many years ASCII text codebooks were distributed, but of late the
standard has shifted to the Portable Document Format (PDF), which requires
an Adobe Acrobat reader. PDF files are larger, but allow for better
quality documents (including pretty-formatted survey instruments, schematics,
diagrams, and the like) and the facility to selectively print from them.
In general, a much wider variety of documentation is becoming available
with new study releases, but the cost of printing is shifting to the
user.
One or more data files is associated with each study. Control statement
and documentation files are optional and less likely to be found with
older studies.
(Top of page)
|