Using Your Module Data: An Intermediate-Advanced Training Session Using SPSS For Windows

Produced by The Measurement Group
For HRSA/HAB's SPNS Cooperative Agreements
Steering Committee Meeting

Supported by Grant #BRU 900113-01-0
From the Health Resources and Services Administration


SPSS for Windows Training Materials

Table of Contents

Preface 3
Outline of Topics for Training Session 4
Training Materials in This Packet

Encounter-Level Versus Aggregated Data

5

How to Aggregate Data

7

How to Combine Data From Different Modules Together

10

Assorted Tips For Client Tracking

13

SPSS for Windows Training Materials

Preface

This set of training materials was developed for the Health Resources and Services Administration (HRSA) Special Projects of National Significance (SPNS) Program Cooperative Agreement Projects. As part of the training and technical assistance activities of the Evaluation and Dissemination Center for the Cooperative Agreement Projects, The Measurement Group provides technical support for data analysis using SPSS for Windows software. These materials are intended to supplement the trainings provided by The Measurement Group for the cooperative agreement projects for use with their local evaluation data.

Specifically, these materials were prepared to supplement the training provided at HRSA/HAB's SPNS Cooperative Agreement Steering Committee meeting in Washington, D.C., September 18, 1997. The September 1997 training is designed for the intermediate-advanced user of SPSS for Windows, whereas trainings held at earlier meetings dealt with more basic SPSS skills (e.g., sessions held at the HRSA/SPNS Program Steering Committee Meeting in San Francisco, California, January 15-16, 1997).

The primary authors of these materials are Dr. Diana E. Brief and Dr. Abigail T. Panter, with additional contributions from Dr. George J. Huba, Dr. Lisa A. Melchior, Ruth Betru, Luke Tharasri, and Paula Jamison. These materials can also be accessed on The Measurement Group web site (www.TheMeasurementGroup.com).


What To Do With Your Module Data:
An Intermediate-Advanced Training On SPSS For Windows

HRSA SPNS Cooperative Agreement Steering Committee Meeting

Plan for the Training Session:

  1. Discuss Distinctions Between Encounter-Level Versus Aggregated Data
  2. Describe How to Aggregate Data Sets
  3. Show How to Combine Data From Different Modules
  4. Provide Assorted Tips for Client Tracking
  • Case Summaries
  • Split File
  • Select Cases

Part 1

Encounter-Level Versus Aggregated Data

Module data are transferred to your project from the Evaluation and Dissemination Center (EDC) in two formats: Encounter and Aggregated.

Encounter-Level Data

The number of lines in your encounter-level data file is the same as the number of forms that were submitted to the EDC by your project. For example:

  • For Module 1 Data: If your project is client-centered, you will have one line of data corresponding to each Module 1 form that your project submitted (e.g., two Module 1 forms for a particular client equals two lines of data for that client).
  • For Module 3 Data: If your project involves training, you will have one line of data corresponding to each Module 3 form that your project submitted.

What Questions Can You Ask With Encounter-Level Data?

You can answer questions about the total work completed by your project using encounter-level data files. A turnstile is a good metaphor for encounter-level data: Each time the turnstile swings, a person walks through, and a specific service is delivered or an assessment is made. Note that the same individual can walk through the turnstile several times. Encounter-level data are best suited for determining the total number of services (trainings) provided, but is less well suited for research questions involving what services a single client receives or what a single training accomplished.

Aggregated Data

Aggregate-level data files are created from encounter-level data and contain as many lines of data as unique key identifiers exist (e.g., client identifiers, staff codes, training numbers). Individuals (trainings) in this data file are sometimes referred to as "unduplicated" or "unique." For example, a person may have come to a clinic 20 times for services over the course of a year, but in the aggregated data file that person is counted once and only has one line of data.

  • Because it is not always appropriate to aggregate a database, your project may not receive an aggregated data file for each module submitted by your project.

What Questions Can You Ask With Aggregated Data?

Aggregation is performed to examine unique (unduplicated) cases. With aggregated data you can answer questions about how many unique clients were served or how well your project penetrated its catchment area. In addition, you can address questions related to how individual clients may have changed over time because each data line may contain summarized information about that client's data at each time point.

Some Points to Keep in Mind About Aggregated Data

  • The number of lines in an aggregated data file may be smaller than its corresponding encounter-level data file.
  • Variables existing across a client’s multiple lines of data (encounter-level) must be summarized in some way so that a client's data (or training) can be represented in a single data line. For example, the smallest value, the largest value, the first value that a client gave, the last value the client gave, the sum, or the mean value are some ways that information is summarized across a client's encounter-level data.
  • When key identifiers are missing or incomplete from a module form, serious data problems result: The number of unique clients (trainings) and services in the aggregated data file is under-represented. In an aggregated data file, cases with missing or incomplete key identifiers are deleted because they cannot be merged properly with other cases having the same identifier. 

Part 2

How to Aggregate Data

An aggregated data file contains a single line of data for each unique client. To aggregate a data file, each separate line for a particular client needs to be collapsed into a single line of data. Each variable in this new, single line of data represents a summary across all values of this variable for the client. Only numeric and date variables can be summarized.

Eight Steps To Aggregated Data

  1. Open the encounter-level version of that data file.
  2. On the menu bar, click à Data.
  3. Click à Aggregate…(You will then see the following screen):

  1. From the variable list on the left, highlight the variable(s) that identify each individual client (training). Move this variable, identified as the "break variable" by SPSS, to the box labeled Break Variable(s).
  • For client-centered projects, the break variable will usually be a client identifier (or the variable, IDEDC).
  • For training projects, the break variable might often be the training number.
  1. From the variable list on the left, highlight the variable(s) that you want to include and summarize in your aggregated data file. Move these variables over to the box labeled Aggregate Variable(s).
  • Because you may want to summarize a variable in more than one way, you can choose the same variable more than once. Simply select and move variables one at a time for as many times as you want to summarize them.
  • Except for the break variable, all variables included in your aggregated data file must/will appear in the Aggregate Variable(s) box.
  1. All variables appearing in the Aggregate Variable(s) box will receive a new name, a variable label, and the function listed that will be used to summarize that variable's values.
  • By default, SPSS provides a "_1" to the old name. You can (and we recommend that you should) change the variable name and label by clicking à Name & Label… Type a new name and label that reflects the nature of the variable and how it was summarized. For example, the screen above shows that the variable DATE was selected. This variable was given a new variable name, DATEF, to reflect the fact that its value is the first value of date for that client (e.g., the date that the client first received services).
  1. By default, SPSS selects "MEAN" as the way to summarize the variable across many lines of data for a client (training). However, as seen in Table 1, there are many other ways to summarize variables and "MEAN" may not be the function you would like to choose in many cases. To select an alternate function, click à Function and whichever function you would like.

    Table 1

    Functions Frequently Used To Summarize Variables And What They Mean

    Function

    What It Means

    Mean of values Computes the mean of all of the values for the variable
    Standard deviation Computes the standard deviation of all of the values for the variable
    First value Takes the first value of the variable
    Last value Takes the last value of the variable
    Minimum value Takes the smallest value of the variable
    Maximum value Takes the largest value of the variable
    Sum of values Computes the sum of the values for the variable

    Note. The first and last values of a variable are literally the value from the individual’s first record and the value from the individual’s last record. We recommend that you should first sort your encounter-level data by client identifier and then by date so that the first and last values really are chronologically, the first and last values for the individual. When dates are missing, they appear first in the data file.

  2. After having selected and defined all variables that you wish to summarize, there are three final steps before the aggregated data file will be created:
  • With the following screen:

  1. Click à Save number of cases in break group as variable. By doing this, SPSS will compute and save as a variable the number of records compressed into the single line of data for the client. You can rename the variable anything you would like (up to 8 characters long). On the screen above, notice that this variable has been named NUMMOD1 (stands for "number of Module 1 records").
  1. Next, click à File, and you will see a screen that looks like:

  1. In the box File name, type the path and file name where you would like your aggregated data file to be stored. Then click à Save.
  • SPSS will bring you back to the master aggregate data screen. At this point, just click à OK. SPSS will create the aggregated database that you specified. To access this data file, you will need to open it as an existing data file. 

Part 3

How to Combine Data from Different Modules

When data are combined from different data sets, this process is called merging. There are two ways to merge data from different data files:

  1. Vertical merge (à ): Add cases. You have two data files with the same variables in each but different clients (trainings). You would like to create a merged file containing all of the cases.
  2. Horizontal merge (à ): Add variables. You have two data files with mostly the same clients in each but different variables. You would like to create a merged file containing all the variables.

Merging Vertically

Because the EDC maintains a core data file for each one of the Cooperative Agreement’s modules and adds data from projects constantly, the EDC performs "vertical merges" very often with encounter-level data files.

A key point for successfully adding cases is to make sure that the variables for the new cases are defined identically to the variables in the core database. Each variable in the core data file must be present in the new data file. In addition, variables with the same names need to be the same length and type.

  • For example, the variable IDEDC in each of the EDC’s core databases is defined as a string variable that is 14 characters long. Thus, when new data are vertically merged to the core database, the variable IDEDC must be defined as a string variable that is 14 characters long.

Steps To Merge Two Data Files Vertically:

  1. Open the encounter-level data file.
  2. Click à Data.
  3. Click à Merge Files.
  4. Click à Add Cases. (You will see a screen labeled, "Add Cases: Read File").
  5. Let SPSS know the name of the data file that holds the cases that you are going to add. You can do this by typing the computer path and data file name in the File name space. Next, click Open.
  6. You will now see a new screen, labeled, "Add Cases from…" and the computer path and name of the data file that you identified as being the one to add data from. On this screen, you should see two boxes: On the left, the box is labeled Unpaired Variables. On the right, the box is labeled Variables in New Working Data File.
  • The Unpaired Variables box tells you the variables from the two databases (old and new) that did not match. (The box should be empty).
  • If variables did not match, an asterisk (*) by the variable’s name indicates that the variable is contained in the core database, but not the database from which you want to add cases. A plus sign (+) by a variable’s name indicates that the variable is contained in the database from which you want to add cases, but not in the core database.
  1. The Variables in New Working Data File box contains all variables that will be merged between the core data file and the data file containing the cases you are adding. If everything is in order, click à OK.
  • A new data file containing the matched cases in the two data files will be created called "Untitled". Note that you should save this merged data file immediately using a new name and preferably, computer path.

Merging Horizontally

If you want to look at data from two different module data files for the same person, you should perform a "horizontal merge."

  • For example, you may want to know whether trainees with different background characteristics differentially rate your project's trainings them. Or, you may want to know to what extent there is an association between a client’s background and the amount of services that the client received.

Data files are merged horizontally by linking records together using a key identifier. For client-centered projects, this identifier often will be the client’s (or patient’s) identifier. For training projects, this identifier often will be the training number or trainee’s identifier.

Steps To Merge Two Data Files Horizontally:

  1. Sort and save each of the data files that you want to merge horizontally.
  2. Open the file for the data containing the variables that you would like to have first in final merged data file.
  3. Click à Data.
  4. Click à Merge Files.
  5. Click à Add Variables. (You will see a screen labeled, "Add Variables: Read File").
  6. Let SPSS know the name of the data file that holds the variables that you are going to add. You can do this by typing the computer path and data file name in the File name space. Next, click à Open.
  7. You will now see a new screen labeled Add Cases from… and the computer path and data file name of the file containing variables that you are merging.

Notice that there are three boxes: Excluded Variables, New Working Data File, and Key Variables.

  • Excluded Variables. This box shows variables appearing in both data files with the same name. In the above example, you can see that there are two variables, SITE and IDEDC that appear in both data files that were trying to be merged and that have the same name. Both variables are actually key identifiers and should be placed in the second box Key Variables.
  • Key Variables. To move variables that will serve as your key identifier(s) into the Key Variables box, click à Match cases on key variables in sorted files. Then, in order of hierarchy, highlight the variables that you want from the Excluded Variables box and place them in the Key Variables box.
  • New Working Data File. Except for key identifiers, in this box are listed the names of all variables that will be contained in the horizontally merged data set.
  1. If everything is in order, click à OK.
  • A new data file containing the matched cases in the two data files will be created called "Untitled". All variables from the first data file you opened will appear first; the variables from the other data file (added) will follow. Note that you should save this merged data file immediately using a new name and preferably, computer path.
  • If a client (or training) is missing data for either one of the modules that you have combined, the values of the variables in this module will all be blank (or system missing).

Three Final Recommendations For Merging Variables From Different Data Files (Horizontal Merge):

  1. Rename variables with the same names. A single data file cannot contain two different variables with the same variable name. Except for the variable(s) that will serve as the links between the data files (e.g., client identifier), you need to rename variables with the same name contained in the two (or more) data files being merged.
  2. Use an aggregated data file. There should be a one-to-one correspondence between the each client’s (training’s) data across the different data files being merged. Serious problems can occur if there is more than one record for a key identifier. In most cases you will want to merge data files that have already been aggregated and sorted on the variable(s) serving as your key identifiers.
  3. Make sure that key identifiers are unique. Even when a data file is aggregated, inappropriate data linking can still occur if the key identifiers used to link the files are not unique. In other words, each client (or training) should have one and only one identifier.

Part 4

Assorted Tips For Client Tracking

SPSS for Windows can also be used to manage services provided to clients or to manage staff workload. Three key tools that can be used for this purpose. Feel free to mix and match these tools to fine-tune the data exploration that you want to accomplish. Each tool is accessed by first clicking the Data option at the main menu bar.

Table 2 shows a summary of each of the three tools, the task each performs, and how to access each.

Table 2

Assorted Tools for Tracking Services to Clients or for Staff Workload

SPSS Tool

Task It Performs

How To Access It

Case Summaries Gives a value for each variable that you would like to see Click on Data followed by Summarize and then Case Summaries…
Split File Gives separate results for each value of the variable(s) that you would like to form groups on Click on Data and then Split File…
Select cases Temporarily (or permanently) creates a data set that is based on some selection criteria Click on Data and then Select Cases…

Some Data Questions Involving…

Case Summaries: How can I find out how many substance abuse and case management services were received by my clients?

  • Use Case Summaries…
  • Highlight all of the variables for which you want to see the values for each of the records in your database.
  • If you are trying to track services to clients, you will probably want to identify each record with a client identifier (IDEDC).
  • For staff, include the variable pertaining to staff codes (STAFF), and for trainings, the training number (TR_NUM).
  • Send these variables over to Variables.
  • Click off the Limit cases to first (limits output to only the first 100 cases in your database) and Show only valid cases (limits output to only those records which have data for the variables requested) boxes.
  • Click à OK.

Split File: How can I get separate results for my male and female clients?

  • Use Split File…
  • Click à Organize output by groups
  • Identify and highlight the variable (GENDER) for whose values you would like to see separate results.
  • Move this variable to the Groups Based on box.
  • Click à OK.

Select Cases: How can I get a list of the services provided to each of my clients who were enrolled during the period of October 1, 1996 and February 28, 1997?

  • Data for this task should come from a horizontally merged database that contains data from your aggregated Module 1 (Demographic-Contact Form) and your aggregated Module 2B (Psychosocial-Intervention Services Form) data files
  • Click à Select Cases…
  • Click à If condition is satisfied.
  • Click à If…
  • In the box on the right, upper-hand corner, type:

datemi >= DATE.MDY(10, 1, 96) and

datema <= DATE.MDY( 2, 28, 97)

  • Click à Continue.
  • Click à OK.
  • Use à Case Summaries; select variables that will identify the individual client, date of first enrollment, and the services that you would like to explore.

TOOLS FOR CLIENT TRACKING

Case Summaries

Split File

Select Cases (screen 1 of 2)

Select Cases (screen 2 of 2)

Next: Doing More With Your Variables: Creating New Variables and Organizing Cases

Related Information:

The Measurement Group Evaluation & Research Tools
The Measurement Group Data Mining and Statistical Modeling


© Copyright 1997-2005 by The Measurement Group LLC. All rights reserved.