|
Using Your Module Data: An Intermediate-Advanced Training Session Using SPSS
For Windows

Produced by The Measurement Group
For HRSA/HAB's SPNS Cooperative Agreements
Steering Committee Meeting
Supported by Grant #BRU 900113-01-0
From the Health Resources and Services Administration
SPSS for Windows Training Materials
Table of Contents
SPSS for Windows Training Materials
Preface
This set of training materials was developed for the Health Resources
and Services Administration (HRSA) Special Projects of National Significance (SPNS)
Program Cooperative Agreement Projects. As part of the training and technical assistance
activities of the Evaluation and Dissemination Center for the Cooperative Agreement
Projects, The Measurement Group provides technical support for data analysis using SPSS
for Windows software. These materials are intended to supplement the trainings provided by
The Measurement Group for the cooperative agreement projects for use with their local
evaluation data.
Specifically, these materials were prepared to supplement the training
provided at HRSA/HAB's SPNS Cooperative Agreement Steering Committee meeting in
Washington, D.C., September 18, 1997. The September 1997 training is designed for the
intermediate-advanced user of SPSS for Windows, whereas trainings held at earlier meetings
dealt with more basic SPSS skills (e.g., sessions held at the HRSA/SPNS Program Steering
Committee Meeting in San Francisco, California, January 15-16, 1997).
The primary authors of these materials are Dr. Diana E. Brief and Dr.
Abigail T. Panter, with additional contributions from Dr. George J. Huba, Dr. Lisa A.
Melchior, Ruth Betru, Luke Tharasri, and Paula Jamison. These materials can also be
accessed on The Measurement Group web site (www.TheMeasurementGroup.com).
What To Do With Your Module Data:
An Intermediate-Advanced Training On SPSS For Windows
HRSA SPNS Cooperative Agreement
Steering
Committee Meeting
Plan for the Training Session:
- Discuss Distinctions Between Encounter-Level Versus Aggregated Data
- Describe How to Aggregate Data Sets
- Show How to Combine Data From Different Modules
- Provide Assorted Tips for Client Tracking
- Case Summaries
- Split File
- Select Cases
Part 1
Encounter-Level Versus Aggregated Data
Module data are transferred to your project from the Evaluation and
Dissemination Center (EDC) in two formats: Encounter and Aggregated.
Encounter-Level Data
The number of lines in your encounter-level data file is the same as the
number of forms that were submitted to the EDC by your project. For example:
- For Module 1 Data: If your project is client-centered, you will
have one line of data corresponding to each Module 1 form that your project submitted
(e.g., two Module 1 forms for a particular client equals two lines of data for that
client).
- For Module 3 Data: If your project involves training, you will
have one line of data corresponding to each Module 3 form that your project submitted.
What Questions Can You Ask With Encounter-Level Data?
You can answer questions about the total work completed by your project
using encounter-level data files. A turnstile is a good metaphor for encounter-level data:
Each time the turnstile swings, a person walks through, and a specific service is
delivered or an assessment is made. Note that the same individual can walk through the
turnstile several times. Encounter-level data are best suited for determining the total
number of services (trainings) provided, but is less well suited for research questions
involving what services a single client receives or what a single training accomplished.
Aggregated Data
Aggregate-level data files are created from encounter-level data and
contain as many lines of data as unique key identifiers exist (e.g., client identifiers,
staff codes, training numbers). Individuals (trainings) in this data file are sometimes
referred to as "unduplicated" or "unique." For example, a person may
have come to a clinic 20 times for services over the course of a year, but in the
aggregated data file that person is counted once and only has one line of data.
- Because it is not always appropriate to aggregate a database, your
project may not receive an aggregated data file for each module submitted by your project.
What Questions Can You Ask With Aggregated Data?
Aggregation is performed to examine unique (unduplicated) cases. With
aggregated data you can answer questions about how many unique clients were served or how
well your project penetrated its catchment area. In addition, you can address questions
related to how individual clients may have changed over time because each data line may
contain summarized information about that client's data at each time point.
Some Points to Keep in Mind About Aggregated Data
- The number of lines in an aggregated data file may be smaller than its
corresponding encounter-level data file.
- Variables existing across a clients multiple lines of data
(encounter-level) must be summarized in some way so that a client's data (or training) can
be represented in a single data line. For example, the smallest value, the largest value,
the first value that a client gave, the last value the client gave, the sum, or the mean
value are some ways that information is summarized across a client's encounter-level data.
- When key identifiers are missing or incomplete from a module form,
serious data problems result: The number of unique clients (trainings) and services in the
aggregated data file is under-represented. In an aggregated data file, cases with missing
or incomplete key identifiers are deleted because they cannot be merged properly with
other cases having the same identifier.
Part 2
How to Aggregate Data
An aggregated data file contains a single line of data for each unique
client. To aggregate a data file, each separate line for a particular client needs to be
collapsed into a single line of data. Each variable in this new, single line of data
represents a summary across all values of this variable for the client. Only numeric and
date variables can be summarized.
Eight Steps To Aggregated Data
- Open the encounter-level version of that data file.
- On the menu bar, click à
Data.
- Click à
Aggregate
(You
will then see the following screen):

- From the variable list on the left, highlight the variable(s) that
identify each individual client (training). Move this variable, identified as the
"break variable" by SPSS, to the box labeled Break Variable(s).
- For client-centered projects, the break variable will usually be a client
identifier (or the variable, IDEDC).
- For training projects, the break variable might often be the training
number.
- From the variable list on the left, highlight the variable(s) that you
want to include and summarize in your aggregated data file. Move these variables over to
the box labeled Aggregate Variable(s).
- Because you may want to summarize a variable in more than one way, you
can choose the same variable more than once. Simply select and move variables one at a
time for as many times as you want to summarize them.
- Except for the break variable, all variables included in your aggregated
data file must/will appear in the Aggregate Variable(s) box.
- All variables appearing in the Aggregate Variable(s) box
will receive a new name, a variable label, and the function listed that will be used to
summarize that variable's values.
- By default, SPSS provides a "_1" to the old name. You can (and
we recommend that you should) change the variable name and label by clicking
à
Name & Label
Type a new name and label that reflects the nature of the variable and how it was
summarized. For example, the screen above shows that the variable DATE was selected. This
variable was given a new variable name, DATEF, to reflect the fact that its value is the
first value of date for that client (e.g., the date that the client first received
services).
By default, SPSS selects "MEAN" as the way to
summarize the variable across many lines of data for a client (training). However, as seen
in Table 1, there are many other ways to summarize variables and "MEAN" may not
be the function you would like to choose in many cases. To select an alternate function,
click à
Function
and whichever function you would like.

Table 1
Functions Frequently Used To Summarize Variables And
What They Mean
Function |
What It Means |
| Mean of values |
Computes the mean of all of the values for the
variable |
| Standard deviation |
Computes the standard deviation of all of the
values for the variable |
| First value |
Takes the first value of the variable |
| Last value |
Takes the last value of the variable |
| Minimum value |
Takes the smallest value of the variable |
| Maximum value |
Takes the largest value of the variable |
| Sum of values |
Computes the sum of the values for the
variable |
Note. The first and last values of a variable
are literally the value from the individuals first record and the value from the
individuals last record. We recommend that you should first sort your
encounter-level data by client identifier and then by date so that the first and last
values really are chronologically, the first and last values for the individual. When
dates are missing, they appear first in the data file.
- After having selected and defined all variables that you wish to
summarize, there are three final steps before the aggregated data file will be created:
- With the following screen:

- Click à
Save
number of cases in break group as variable. By doing this, SPSS will compute and save
as a variable the number of records compressed into the single line of data for the
client. You can rename the variable anything you would like (up to 8 characters long). On
the screen above, notice that this variable has been named NUMMOD1 (stands for
"number of Module 1 records").
- Next, click à
File, and you will see a screen that looks like:

- In the box File name, type the path and file name where you
would like your aggregated data file to be stored. Then click à
Save.
- SPSS will bring you back to the master aggregate data screen. At this
point, just click à
OK.
SPSS will create the aggregated database that you specified. To access this data file, you
will need to open it as an existing data file.
Part
3
How to Combine Data from Different Modules
When data are combined from different data sets, this process is called merging.
There are two ways to merge data from different data files:
- Vertical merge (à ): Add cases. You have two data files with the same variables in each but
different clients (trainings). You would like to create a merged file containing all of
the cases.
- Horizontal merge (à ): Add variables. You have two data files with mostly the same clients in each
but different variables. You would like to create a merged file containing all the
variables.
Merging Vertically
Because the EDC maintains a core data file for each one of the
Cooperative Agreements modules and adds data from projects constantly, the EDC
performs "vertical merges" very often with encounter-level data files.
A key point for successfully adding cases is to make sure that the
variables for the new cases are defined identically to the variables in the core database.
Each variable in the core data file must be present in the new data file. In addition,
variables with the same names need to be the same length and type.
- For example, the variable IDEDC in each of the EDCs core databases
is defined as a string variable that is 14 characters long. Thus, when new data are
vertically merged to the core database, the variable IDEDC must be defined as a string
variable that is 14 characters long.
Steps To Merge Two Data Files Vertically:
- Open the encounter-level data file.
- Click à
Data.
- Click à
Merge
Files.
- Click à
Add
Cases. (You will see a screen labeled, "Add Cases: Read File").
- Let SPSS know the name of the data file that holds the cases that you are
going to add. You can do this by typing the computer path and data file name in the File
name space. Next, click Open.
- You will now see a new screen, labeled, "Add Cases from
"
and the computer path and name of the data file that you identified as being the one to
add data from. On this screen, you should see two boxes: On the left, the box is labeled Unpaired
Variables. On the right, the box is labeled Variables in New Working Data
File.
- The Unpaired Variables box tells you the variables from the
two databases (old and new) that did not match. (The box should be empty).
- If variables did not match, an asterisk (*) by the variables name
indicates that the variable is contained in the core database, but not the database from
which you want to add cases. A plus sign (+) by a variables name indicates that the
variable is contained in the database from which you want to add cases, but not in the
core database.
- The Variables in New Working Data File box contains all
variables that will be merged between the core data file and the data file containing the
cases you are adding. If everything is in order, click à
OK.
- A new data file containing the matched cases in the two data files will
be created called "Untitled". Note that you should save this merged data file
immediately using a new name and preferably, computer path.
Merging Horizontally
If you want to look at data from two different module data files for the
same person, you should perform a "horizontal merge."
- For example, you may want to know whether trainees with different
background characteristics differentially rate your project's trainings them. Or, you may
want to know to what extent there is an association between a clients background and
the amount of services that the client received.
Data files are merged horizontally by linking records together using a
key identifier. For client-centered projects, this identifier often will be the
clients (or patients) identifier. For training projects, this identifier often
will be the training number or trainees identifier.
Steps To Merge Two Data Files Horizontally:
- Sort and save each of the data files that you want to merge horizontally.
- Open the file for the data containing the variables that you would like
to have first in final merged data file.
- Click à
Data.
- Click à
Merge
Files.
- Click à
Add
Variables. (You will see a screen labeled, "Add Variables: Read
File").
- Let SPSS know the name of the data file that holds the variables that you
are going to add. You can do this by typing the computer path and data file name in the File
name space. Next, click à
Open.
- You will now see a new screen labeled Add Cases from
and the
computer path and data file name of the file containing variables that you are merging.

Notice that there are three boxes: Excluded Variables, New
Working Data File, and Key Variables.
- Excluded Variables. This box shows variables appearing in
both data files with the same name. In the above example, you can see that there are two
variables, SITE and IDEDC that appear in both data files that were trying to be merged and
that have the same name. Both variables are actually key identifiers and should be placed
in the second box Key Variables.
- Key Variables. To move variables that will serve as your key
identifier(s) into the Key Variables box, click à Match cases on key variables in sorted files.
Then, in order of hierarchy, highlight the variables that you want from the Excluded
Variables box and place them in the Key Variables box.
- New Working Data File. Except for key identifiers, in this
box are listed the names of all variables that will be contained in the horizontally
merged data set.
- If everything is in order, click à OK.
- A new data file containing the matched cases in the two data files will
be created called "Untitled". All variables from the first data file you opened
will appear first; the variables from the other data file (added) will follow. Note that
you should save this merged data file immediately using a new name and preferably,
computer path.
- If a client (or training) is missing data for either one of the modules
that you have combined, the values of the variables in this module will all be blank (or
system missing).
Three Final Recommendations For Merging Variables From Different Data
Files (Horizontal Merge):
- Rename variables with the same names. A single data file cannot
contain two different variables with the same variable name. Except for the variable(s)
that will serve as the links between the data files (e.g., client identifier), you need to
rename variables with the same name contained in the two (or more) data files being
merged.
- Use an aggregated data file. There should be a one-to-one
correspondence between the each clients (trainings) data across the different
data files being merged. Serious problems can occur if there is more than one record for a
key identifier. In most cases you will want to merge data files that have already been
aggregated and sorted on the variable(s) serving as your key identifiers.
- Make sure that key identifiers are unique. Even when a data file
is aggregated, inappropriate data linking can still occur if the key identifiers used to
link the files are not unique. In other words, each client (or training) should have one
and only one identifier.
Part 4
Assorted Tips For Client Tracking
SPSS for Windows can also be used to manage services provided to clients
or to manage staff workload. Three key tools that can be used for this purpose. Feel free
to mix and match these tools to fine-tune the data exploration that you want to
accomplish. Each tool is accessed by first clicking the Data option at the
main menu bar.
Table 2 shows a summary of each of the three tools, the task each
performs, and how to access each.
Table 2
Assorted Tools for Tracking Services to Clients or for
Staff Workload
SPSS Tool |
Task It Performs |
How To Access It |
| Case Summaries |
Gives a value for each variable that you would
like to see |
Click on Data followed by Summarize
and then Case Summaries
|
| Split File |
Gives separate results for each value of the
variable(s) that you would like to form groups on |
Click on Data and then Split File
|
| Select cases |
Temporarily (or permanently) creates a data
set that is based on some selection criteria |
Click on Data and then Select Cases
|
Some Data Questions Involving
Case Summaries: How can I find out how many substance abuse and case
management services were received by my clients?
- Use Case Summaries
- Highlight all of the variables for which you want to see the values for
each of the records in your database.
- If you are trying to track services to clients, you will probably want to
identify each record with a client identifier (IDEDC).
- For staff, include the variable pertaining to staff codes (STAFF), and
for trainings, the training number (TR_NUM).
- Send these variables over to Variables.
- Click off the Limit cases to first (limits output to only
the first 100 cases in your database) and Show only valid cases (limits
output to only those records which have data for the variables requested) boxes.
- Click à
OK.
Split File: How can I get separate results for my male and female
clients?
- Click à
Organize
output by groups
- Identify and highlight the variable (GENDER) for whose values you would
like to see separate results.
- Move this variable to the Groups Based on box.
- Click à
OK.
Select Cases: How can I get a list of the services provided to each
of my clients who were enrolled during the period of October 1, 1996 and February 28,
1997?
- Data for this task should come from a horizontally merged database that
contains data from your aggregated Module 1 (Demographic-Contact Form) and your aggregated
Module 2B (Psychosocial-Intervention Services Form) data files
- Click à
Select
Cases
- Click à
If
condition is satisfied.
- Click à
If
- In the box on the right, upper-hand corner, type:
datemi >= DATE.MDY(10, 1, 96) and
datema <= DATE.MDY( 2, 28, 97)
- Click à
Continue.
- Click à
OK.
- Use à
Case
Summaries; select variables that will identify the individual client, date of first
enrollment, and the services that you would like to explore.
TOOLS FOR CLIENT TRACKING
© Copyright 1997-2005 by The Measurement Group LLC. All rights
reserved. |