8. Data Management

8.1 Introduction

This chapter covers the tasks, activities, and deliverables for instrument and survey systems data management throughout all phases of data collection, and adheres to the 2004 Survey Research Operations Standards.

SRO Standard Project Procedures

Data management activities are focused on ensuring instrument and survey management system data quality, and result in the following mandatory and conditional activities and deliverables:

Deliverables

Sample management system project setups for testing, training, certification, and production;
Project survey management system data dictionary;
Sample assignment files;
Loaded sample with preload data;
Loaded project staffing files;
Merged survey data;
Quality assurance checks; and
Final datasets.

In addition, the Data Manager must get sign-off from the Project Leader before the start of Production for:

Testing, training, and production-testing projects;
The survey management system data dictionary;
The instrument codebook;
Quality assurance checks;
Respondent profiles; and
The sample removal procedure.

Deliverables

Ad hoc reports;
Project instrument codebook or data dictionary;
Respondent profile data;
Respondent payment data;
Respondent contact logging system data;
Consent form data; and
Interviewer verification and evaluation data.

Data Managers are heavily involved in all phases of most projects conducted by SRO. These include:

Testing (testing all technical systems used in the data collection process);
Data collection (performing quality checks of data coming in, helping with technical problems);
Data delivery (extracting data and codebook for analysts); and
Project archiving (documenting and archiving project files).

Figure 8.1 provides an overview of data management activities. Survey Research Operations (SRO) best practices for data management are described in sections 8.2 through 8.11 and discuss the following activities or phases of a project lifecycle from a data management perspective:

8.2 Design and specification;
8.3 Testing;
8.4 Training;
8.5 Production setup;
8.6 Data collection support;
8.7 Data deliverables and reports;
8.8 Other systems data managers use;
8.9 Project closeout;
8.10 SRO applications; and
8.11 Resources.

The following link provides a summary of current data management tasks throughout the lifecycle of a project, estimates of the amount of time they take, and whether they are done during pre-production, production, or post-production.

Time Estimates by Task

This chapter provides an overview of these and other activities. The link that follows provides links to more detailed information on testing, training, production setup, data collection support, data deliverables, and project closeout.

Lifecycle of a Data Collection Project and Data Manager Tasks

It should be noted that many of the processes described in this chapter relate to studies that use SurveyTrak, SRO’s sample management system for face-to-face data collection. Traditionally, SurveyTrak projects have been among the largest, most complex, and most common examples of survey data collection and require a high level of data management support. However, SRO is increasingly turning to other systems to collect data and manage sample, especially in mixed-mode studies (see Section 8.8). As we continue to develop and utilize other sample management systems, our documentation and best practices will evolve. Regardless of the specific technical systems used, topics covered in this document can be used as a general framework to guide data management considerations.

For a detailed description of tasks and procedures, please visit the Data Management Wiki page. Contact a Data Manager supervisor for access.

8.2 Design & Specification

At the preliminary/technical planning stages of a project, a technical team is put together based on resources available in each department. Each team is composed of a Technical Leader, the Project Leader, Production Manager and Production Assistants, a Database Administrator (DBA), a Programmer for each technical system, a Data Manager, and an Interviewer Help Desk representative.

During the system and instrument design phase of development, the Project Leader and Technical Leader discuss issues related to quality assurance with their Data Manager. These discussions are reflected in the following documents that the Technical Leader provides the Data Manager:

Instrument technical specifications;
System technical specifications;
Description of the dataset’s content and format; and
Testing checklist.

8.3 Testing

The testing phase of a study is when the processes used for data collection are set up and the Data Manager confirms them to be working and collecting the correct data.

The Data Manager needs to be very closely involved in the testing process. This is the time to set up peripheral systems and processes, including the quality checks and dataset processing. There are many tasks that the Data Manager performs in the testing phase of a project.

8.3.1 Set up project directory structure

At the start of the project, the Data Manager creates a project folder in the data manager directory (L:\groups\TSG\dataops) and performs most tasks from there. This structure also initiates the project documentation process. Figure 8.2 shows the general folder structure. Some of the standard Data Manager processes are included within this template folder. To start a new project, this structure and content can be copied and renamed.

Figure 8.2 General Project Folder Structure

8.3.2 Create projects

If the project is using the SurveyTrak sample management system, the Data Manager can copy the most current general interviewer training (GIT) project, a previous wave production project, or a similar production project within the same community. The Data Manager works with the SurveyTrak Programmer to determine what project will be the basis for the new test project being created. Depending on how complex the project is, there may be multiple test projects. The Data Manager or SurveyTrak Programmer will create these projects as needed, usually through the copy project application in the ST Admin module. Copies of the testing project are used and modified for the training and certification projects. For further information and steps for creating projects, see the Data Management Wiki. Contact a Data Manager supervisor for access.

Often in panel studies, the previous wave’s production project must be copied from an older SurveyTrak database. For these projects, the Data Manager uses a SAS macro to copy the older project tables, update them with the new project name, and append the tables to the new database.

It is standard practice for the Data Manager to create the testing, training, and interviewer certification projects. The SurveyTrak Programmer creates the production and the production-testing projects.

8.3.3 Add users to or remove users from a project

Once the test project is created, the Data Manager adds project users—the staff who will do the testing. For most SurveyTrak projects, the testers are those on the technical team. To protect production data, the Data Manager confirms that testing users are not simultaneously designated as active on a study’s production project. For further information and steps for adding and removing users, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.3.4 Create test lines

The LineGenerator application helps create sets of testing, training, and certification lines. The Data Manager loads a template set of lines into the LineGenerator tables in SurveyTrak, either through the application or from an Access database. For the testing project, the application makes sequential copies of these lines for selected testers; for training and certification projects, it makes copies of these lines for selected interviewers.

When projects are straightforward, the Data Manager creates one set of lines, loading and sending them out to the testers. Complex projects require loading sets of lines one at a time, one set for each tester, so that testers get different sets of sample IDs (SIDs).

Once lines are set up for test projects, training, certification, and production testing, it is standard protocol for the Data Manager to test the lines before sending them to other testers. This includes checking SurveyTrak preload data on all tabs that have preload data, as well as launching the Blaise instrument and administering the questionnaire until the last preloaded question is reached (ensuring that the data lines up correctly). When a project has a complicated preload, the Blaise database can be queried after updating the preload to confirm that the preload is correct. This testing cannot be skipped and must be done on interviewer equipment — otherwise the Data Manager will not be able to ensure that the preload meets specifications.

This level of testing is especially important when: (1) lines are first set up on a project; (2) changes have been made to the structure of the Blaise preload; and (3) lines are being sent out to a larger group for testing. For a project that “spawns” or generates new lines, the Data Manager must be sure to go through the new lines to confirm that the spawning process is pushing through the correct preload.

8.3.5 Create profiles

A panel study may require the creation of respondent profiles (sample information from previous waves that the interviewer needs to reference). If a production project requires profiles, the testing phase is a good time to test setting up and loading profiles into the sample management system (see the profiles documentation in Section 8.4 (Training)).

Prior to training, the Project Leader must sign off, agreeing that the profiles meet specifications.

8.3.6 Create SurveyTrak data dictionary

Each project must have an up-to-date sample management system data dictionary. Decentralized SurveyTrak projects start with the standard SurveyTrak tDataDictionary table, while Survey Services Lab (SSL) studies do not generally have a standard data dictionary. Centralized telephone studies start with the computer assisted telephone interviewing (CATI) Sample Management System (SMS) tDDP_MappingDetails table.

When creating a new project by copying a project within SurveyTrak, the data dictionary is also copied. The standard named fields are documented. It is generally the project specific extension columns that need to be added and updated for each new project.

The Project Leader must sign off during testing, agreeing that the data dictionary meets specifications.

8.3.7 Test data extraction processes

Data fields are “pulled” from the Blaise instrument and put into appropriate SurveyTrak tables by the Blaise Component Pack (BCP, the Blaise Application Programming Interface or API). These fields are identified with the Blaise field name and “BCP” in the description field in the data dictionary.

While some projects have very standard testing protocols and scenarios, others do not. It is important to test all projects thoroughly to be certain that data is being correctly displayed and captured. Part of the standard testing that Data Managers do is to complete an interview and confirm that the BCP pulls are populating the SurveyTrak fields specified.

In this phase, the Data Manager generates an instrument testing dataset and performs standard and custom quality checks. The Data Manager may also generate an instrument codebook and ask the Project Leader to review it against the instrument specifications.

The Data Manager should set up the data extraction and merge process while in the testing phase. This allows the Data Manager to go through the data extraction procedures and become familiar with the data coming out of the instrument. The data merge setup is described in Section 8.5 (Production Setup).

8.3.8 Test consent and recording

It is common for projects to digitally record a subset of survey interviews for quality assurance and/or research purposes. Such projects ask at the beginning of the interview whether the respondent agrees to be recorded. The response to that question determines whether a recording is started or saved.

During the testing phase, the Data Manager works with the Interviewer Help Desk to confirm that there are quality recordings for SIDs where Consent was “Yes,” and that there are no saved recordings where Consent was “No.” The Data Manager may also be asked to manually set recording flags in the sample or test automated system flagging.

8.4 Training

8.4.1 Create training projects

In preparation for interviewer training, the Data Manager may create a number of projects. All newly hired interviewers go through a separate General Interviewer Techniques (GIT) training to learn standard interviewing procedures and to become familiar with the sample management system, SurveyTrak. The Data Manager adds the new interviewers to the most current GIT project and creates the project-specific projects for both training and certification. Because many studies involve more than one SurveyTrak project, some of the tasks in the following sections are duplicated for each project in the SurveyTrak database related to the study.

The Data Manager is notified that the staff list is available once it is finalized and posted on eRoom. Then the Data Manager adds the interviewers to the GIT, training, and certification projects and generates lines in each using LineGenerator. The Data Manager must work with the Interviewer Help Desk staff responsible for loading the interviewer laptops and ensure that the projects and sample lines are on the laptops and that quality checks are performed before training begins.

8.4.2 Prepare sample lines

Project Leaders provide scenarios for training. This is usually done in the testing phase, so they can thoroughly test the scenarios they will be using at training. There are two types of preload to prepare: (1) the sample management preload (respondent names, addresses, behind the scenes flags (i.e., data to load into SurveyTrak)), and (2) the Blaise preload.

Often, the Blaise preload for the training lines will come in a long Excel file. This file must be manipulated to create the Blaise preload string, usually done in SAS and then exported to a caret (^) delimited text file to load into the SurveyTrak tCapi table.

It is necessary to run quality checks on the Blaise preload data. This could involve importing the preload into the datamodel and looking for import errors, running through testing lines to make sure the values line up correctly, and looking at the variable lengths and distributions in SAS to confirm that the preload fits the data model. Data Managers should understand that the preload is a critical piece of the data collection process, and ideally the Survey Director and Blaise programmer will work with the Data Manager to establish appropriate QC checks and follow-up.

8.4.3 Attach RCLS calls to sample lines

The Respondent Contact Logging System (RCLS) is an application designed to track incoming contacts made or attempted. These are separate from interviewer attempts to contact respondents. Contacts logged in this system are incoming calls to the respondent toll-free line and voicemails, returned or undeliverable mail, and other forms of communication coming into the Ann Arbor office. Emails containing pertinent contact information are sent to the interviewer from this system, and the sample lines are flagged for the interviewer to see that there is information for them to review.

There is a section of the GIT training that covers RCLS and it is standard practice for the Data Manager to attach RCLS calls to two of the GIT lines. Because attaching RCLS calls has not been added to the LineGenerator application, this process is done through a standard set of MS Access queries. The Access database, GIT12_RCLS Prep ST11.mdb, is located in the Training folder in the Template project folder
(\\SRC-Douvan\SRO\groups\TSG\dataops\TEMPLATE\NEWPROJECT\Training). For further information and steps to add RCLS calls to training lines, see the Data Management Wiki. Contact a Data Manager supervisor for access.

RCLS Call Structure

There are four lines in the tSampleLine_Supplemental table for each call. A view in the SurveyTrak Reporting database, DBA.vRCLS_view, organizes the RCLS data in single rows for each call. This is the best place to look at the RCLS data.

The Data Manager sometimes needs to close RCLS calls, set RCLS to send messages to the right people, or back up records when resetting lines. In addition, throughout the life of a SurveyTrak project, RCLS action requests come up from time to time for the Data Manager. These tasks are discussed in Section 8.6, Data Collection Support.

8.4.3 Create profiles

Data from previous waves of data collection, such as address or household information, can be displayed in SurveyTrak, Webtrak, and RCLS. These are called respondent profiles, which are available for the interviewer to reference during data collection. Either the project staff or the Data Manager creates the profiles as text or html files. After the production sample is loaded into SurveyTrak, the Data Manager uses the application RProfile to load the profiles. If the Data Manager is to create the profiles, the Project Leader provides the specifications and signs-off on the profiles before they are loaded into SurveyTrak. If project staff creates the profiles, the Data Manager should review and test them.

Data for the profiles generally comes from a wide range of sources, including: SurveyTrak data from past waves (call records, RCLS notes, and any BCP pulls from the instruments); address updates from other sources; Census data, etc. There is example SAS code to generate production profiles as well as profiles on training lines in the project production template folder (see Section 8.3.1). It is also standard practice to run the HTML code through an HTML validator such as that in UltraEdit.

HRS 2012 introduced using Javascript to create profile sections that open and close, as well as imposing a maximum length on lines of text to increase readability. Due to their size, the profiles were generated into four files and run on the Linux server.

When creating large profiles, the Data Manager needs to work closely with the DBA team to coordinate the loading of profiles. Data Managers give the DBA team at least 24 hours’ notice that profiles need to be loaded. This advance notice is necessary even if only a subset of the profiles is being re-loaded.

8.5 Production Setup

As mentioned earlier, the Data Manager creates the testing, training, and certification projects, and the SurveyTrak programmer creates the production and production testing projects. This is because Data Managers are not familiar with some SurveyTrak tables. Once projects are created, the Data Manager is responsible for tasks described in Sections 8.5.1 through 8.5.6.

8.5.1 Prepare sample lines

The Data Manager adds information, reformats the file, and may split it into multiple parts, based on the specified needs of the project.

Once reformatted, sample is loaded into SurveyTrak. For most projects, the Data Manager loads the sample into three tables: tSample_Line, tSample_Line_Address, and tCapi. If the project is using preload shared by related sample lines, the tShared_Preload needs to be populated. If the project is using the letters module and prenotification letters are sent, tLetters needs to be populated. The tRpay table also needs to be populated if checks are sent out with the prenotification letters.

8.5.2 Create assignment file

The purpose of the sample assignment procedure is to allocate production sample to the interviewing staff assigned to a production project. Some projects assign sample at the sample line level, while other projects assign sample at the Primary Sampling Unit (PSU) or PSU segment level. After cleaning and reformatting the sample file, the Data Manager creates an assignment file based on the project needs.

The Data Manager sends the file(s) to the Production Manager or designated assigner, who determines the assigned interviewer for each sample line. They add the Interviewer’s 8-digit University ID to the assignment file and return it to the Data Manager. The Data Manager then performs certain quality checks, and may need to work with the assigner to clarify or correct information. The assigned interviewer field is used to load sample into SurveyTrak. The Data Manager also may need this information to process respondent prepayments if they are to be sent to the interviewers. For further information and steps to create sample assignment files, see the Data Management Wiki. Contact a Data Manager supervisor for access.

Sample Assignment Examples

The Data Manager works with the assigner, and if necessary the Production Manager, to minimize any delay in receiving and preparing the assignment file on schedule. At the Production Manager’s request, Data Managers may assign sample directly to a Production Coordinator or Team Leader, who will later transfer the lines to interviewers. This may happen if there is uncertainty about the availability of interviewers at the time sample assignments need to be made.

It is good practice to send the Project Managers and Production Manager (and Team Leaders in some cases) a summary of sample assignments after the lines have been loaded into SurveyTrak, but before they are activated. This gives the Managers another chance to easily make changes before the sample “goes live.”

Sometimes sample assignments may change after the initial processing, for example, if an interviewer is terminated during training. If the sample is inactive (tSampleLine.sActiveStatus = “02” or “03”), the Data Manager reassigns the sample in tSample_Line to another interviewer or the Team Leader as requested. If the sample is active, it can be transferred from the interviewer’s laptop via SurveyTrak, or forced by the Data Manager if necessary. If there were prepayments, they may also need to be redirected.

8.5.3 Release sample

Once the assignment file has been approved, production sample activation can differ from project to project. Data Managers coordinate with the Project Leaders and the Interviewer Help Desk on when and how to release the sample. Sample cannot be released for interviewers who have not yet passed certification. Typically when sample is released, the Data Manager sends an email to Project Managers and the Interviewer Help Desk. To release (or activate) sample, tsample_line.sactivestatus is changed from “03” to “01.”

Sometimes Data Managers are asked to perform sample management tasks, such as release sample in different batches or pull sample back. These tasks should be discussed with the Project Manager and documented.

8.5.4 Set up data extraction

During overnight processing, the Interview Data Merge (IDM) takes all the completed interviews from the interviewers’ laptops and merges them into one master file with the SurveyTrak Reporting System Manager (SRSM). It is the Data Manager’s responsibility to set the merge criteria and monitor IDM’s progress. Once the IDM is set up for the project’s instruments, additional criteria changes will need to be made when datamodels are updated, which is best done through STAdmin. For further information and steps for setting up a merge, review error codes and merge criterion, see the Data Management Wiki.

The merge examines Blaise audit trail files (file extensions .adk and .adt) to determine the interview length and, for some projects, block lengths. Occasionally audit trail files are corrupt. If a file is corrupt, it cannot be relied on for determining interview length. The merge performs a number of checks to determine whether the file may be corrupt. So that interview data can successfully merge even if an audit trail file is corrupt, audit trail files are processed in batch after interview data is merged.

Depending on interviewers’ send and receive times, some interviews (mostly contact observations) get missed. Before producing any data set for delivery to the clients, it is necessary to set the date in the merge criteria back to either the beginning of the study, or another reasonable date, in order to identify any missed data.

For example, the National Survey of Family Growth (NSFG) sets the date back to the beginning of every quarter on Mondays, Wednesdays and Fridays. This is because there is ongoing data analysis throughout the quarter and this keeps the daily files up-to-date. In the NSFG example, the date in tmerge_criteria is updated in an Access macro, and then the IDM runs with error handling through a scheduled task on the processing computer.

8.5.5 Set up datamodel migration

When a datamodel is updated during production, it is necessary to migrate the master data file to the new datamodel so that all interviews have the same data structure. The Blaise programmer should include the migration script (blaTobla.asc) in the datamodel. Sometimes the migration scripts are straightforward, and in other cases there is a complicated algorithm. Occasionally it is necessary to manually migrate to a newer or older datamodel. For further information and steps for datamodel migration, see the Data Management Wiki.

8.5.6 Set up daily reports archiving

Daily Reports, or Field Progress Reports (FPRs), are standard .pdf files that programmers set up for each project. They display data from tables in the SurveyTrak reporting database. These reports allow Project Managers and Supervisors to see what production (hours per interview, response and completion rates, etc.) looks like on a given day. They are archived daily, and are used to compare production across waves of data collection or during budget preparation for a similar study.

The Daily Report Archive System (DRAS) runs a scheduled task twice a day at 8:30am and 3:00pm. At each run, the system looks for new reports, archives marked previous reports, and checks disk space.

DRAS sends emails at each runtime including:

Number of reports archived;
New reports in the WebTrak folder;
New reports not approved within DRAS; and
When the archive folder has reached 90% of available space.

The archive folder is located at \\SRC-Hess\data\CO_rpts. Each project has a separate subdirectory. There is also a subdirectory called “Archive.” When a project ends, the final reports are kept in the project folder under the Archive subdirectory. If there is a request for final daily reports, a Data Manager can pull reports from the Archive folder.

When the out of disk space message comes from DRAS, a Data Manager moves the archived reports from the network drive to the SRO archive server.. This practice ensures that we have reports for whenever they are needed for review.

For further information and steps for archiving daily reports, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.6 Data Collection Support

8.6.1 Perform quality assurance checks

It is impossible to anticipate all problems that may occur during data collection, but it is important to identify and respond to problems quickly. The Data Manager is responsible for identifying errors in survey data and problems related to the procedures, processes, systems, and instruments used to create the data. This includes associated metadata (data about data) and paradata (data about processes). To be able to uncover problems that affect data quality as soon as possible and to take corrective action, the Data Manager creates a project quality assurance program. The goal of the program is to ensure that all data will fully meet the analysis requirements of the study and conform to SRO Standards.

The program has two components. The first outlines SRO standard checks that must be done for every project. This involves intensive monitoring of specific data items and study procedures and processes throughout the data collection period.

The second component of the program consists of any additional data quality checks the Principal Investigator(s) and project staff feel are necessary to ensure that that data will meet the research and analysis requirements of the study. Projects vary widely in requirements, and each project may enhance standard data quality control procedures, as well as define unique measures. The checks performed in this portion of the program are designed in close collaboration with the Technical Leader, Project Leader, and the Principal Investigator and project staff.

There are nine standard QC SAS programs called by a tenth program (_RunAll.sas) in the directory \\SRC-Douvan\SRO\groups\TSG\dataops\DOCUMENT\SAS\QC. The nine programs are:

QC Checks Addresses.sas
QC Checks Call Records.sas
QC Checks Contact Obs.sas
QC Checks Interviewer.sas
QC Checks Merge Track.sas
QC Checks Misc Standard.sas
QC Checks Preload.sas
QC Checks Rpay.sas
QC Checks Timings.sas

The _RunAll.sas program in the template folder points to the above standard SAS programs, so they do not need to be copied. If project specific changes need to be made, they can be made in a separate SAS program called in the the _RunAll.sas program.

For further information and steps of the QC process, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.6.2 Respond to help desk calls

Requests come to the Data Managers through the Footprints help desk system. The Data Manager is notified by email about Interviewer Help Desk calls that require their attention. There are a number of different types of Help Desk calls, ranging from adding a user to a project to an interviewer losing interview data. For further information and steps to address Help Desk calls, see the Data Management Wiki.

Remove interviewer from project (“closeout”)

The Data Manager’s role in closing out an interviewer’s project assignments is to identify what production data is left on the interviewer’s laptop when they leave a project. This includes sample lines, recording files, RCLS calls, and any other project-specific data.

The closeout request is initiated when an interviewer quits, is terminated, or is otherwise finished working on part or all of a project. A closeout request may come in for one project, while the interviewer continues to work on another project and can happen before a project begins production, in the middle of data collection, or as the project wraps up production.

Closeout requests are handled similarly for each of these scenarios, but can differ based on project needs. Data Managers approve an interviewer for closeout when there are no production data left on the laptop and the interviewer has been deactivated on the project. It is best practice to flag and remove all lines and deactivate the interviewer from all GIT, training, certification, and production projects associated with the study. If the interviewer will remain on another project, then the GIT project and sample lines can remain active on their laptops. The Data Manager’s approval then goes through the Footprints system to either (1) Inventory when the laptop has been returned to Ann Arbor, or to (2) the Interviewer Help Desk, if the interviewer is being removed from one project but is still active on other projects.

Once the Data Manager sends approval for closeout through the Footprints system to Inventory, the closeout process continues through the different channels—DBAs revoke replication and Computing and Multimedia Technologies (CMT) removes them from the active directory for those interviewers who returned their laptop.

Remove sample from interviewer databases (“force transfer”)

Circumstances around removal of sample from interviewers databases (“force transfers”) are rare, but can be requested by project staff when a laptop is lost or stolen, an interviewer is terminated quickly, or other instances where a regular line transfer is not possible.

The Data Manager FIRST confirms with the DBAs that replication has been revoked for the interviewer who currently holds the lines.

Reset sample line

Interviewers sometimes screen at the wrong household or persons are added or missed in the household rostering, affecting the sampling process. There are other reasons for line resets, however, these are uncommon and project-specific. Lines need to be thoroughly reviewed and approved, generally by Team Leaders, Project Coordinators, or Project Leaders before resetting because this process can impact the sample selection. There have been times when an interviewer has asked for a line reset simply because the person selected through the household rostering was not home. Resetting this kind of case would be considered data manipulation and is grounds for termination.

Before production starts, discuss the protocol for handling line resets with the project manager. For example, NSFG Cycle 8 requires the interviewer to get approval from their FOC and project staff before submitting a help desk call. Before resetting the line, the Data Manager confirms that there has been approval from one of three project staff.

Recover lost interview

Sometimes interviews get lost, or appear missing. Notification when this occurs is when the Interview Data Merge sends an error saying “interview data is null.”

Often the interview can be recovered from the SurveyTrak Incremental Backup logs. The SurveyTrak database is backed up in 15 minute increments. The Data Manager uses a .sql file to translate the log file. If the interview is found in this file, then it can be loaded back into tCAPI using iSQL.

8.6.3 Remove sample from interviewer laptop (“scoop”)

Data Managers are expected to keep the “Removing lines from laptops – Current procedures” document for on-going projects. This document is available to all on\\SRC-Douvan\SRO\share\Data Ops.

Current Sample Removal Procedures

After the interview is completed, sample lines will be pulled from the Interviewer’s laptop at a pre-defined, regular interval. This process aids in securing the data from unexpected incidents (i.e., a stolen laptop). Standard protocol is to run this process at least once a week, although some projects, such as NSFG, run this process once a quarter.

Because each project is unique, the Project Leader defines what constitutes a completed sample line. Some projects want to keep sample lines active in the field until Self-Administered Questionnaires (SAQs) are returned and logged in Ann Arbor, while other projects wait for incentive checks to be processed.

To be removed from the laptop, the sample line must meet the following conditions:

The sample line must have a final result code;
There must be consistency between the final result code and interview status according to project specifications;
The interview must have been merged into the main master file(s) at least 7 calendar days prior;
All contact observations and post-interview observations for the sample line must be completed according to project specifications;
Housing unit observations must be completed, if housing unit observations are required for the project;
If payments are being recorded in SurveyTrak, a payment record must exist according to project specifications; and
If the project is using digital recording, the digitally recorded interview (DRI) files must be logged as received by the Ann Arbor office, and files removed from the interviewer’s laptop.

Sample is also removed from laptops is when the interviewer leaves the project. This process is triggered when the Data Manager is assigned a help desk call notifying them that an interviewer is leaving a project, regardless of whether they are keeping the laptop or not. In this case, all lines associated with the study need to be cleared from the laptop. This process is described in the Interviewer Close Out section under Help Desk Calls.

For further information and steps for sample removal, see the Data Management Wiki.

8.6.4 Document lost or stolen laptop

In the rare circumstance of a lost or stolen laptop, actions are taken to document what was on the laptop.

The Data Manager sends a report to the Director of Data Collection Services (DCS) and copies the Interviewer Help Desk Supervisor and the Project Leader(s). The Data Manager then stays available in case any further questions come up and to help facilitate communication with the DBAs if the interviewer’s database replication needs to be revoked and/or sample needs to be forced to another interviewer. Data Managers should not force sample without first receiving confirmation that replication has been revoked.

The Help Desk Supervisor, the DBAs, DCS, and Inventory usually step in to handle the rest of the process of trying to track down the laptop. Eventually, the Data Manager may need to force transfer that interviewer’s sample to someone else. (See force transfer section under Section 8.6.2, Respond to Help Desk Calls.)

For further information and steps to address lost laptops, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.6.5 Maintain documentation

Throughout the development, testing, and production stages of a project, it is standard practice to maintain documentation for each task that is performed on the project. This also includes retaining emails relevant to decision making and process flow. The documentation should be easy to follow for another Data Manager who may serve as backup if the primary Data Manager is ill, on vacation, or otherwise indisposed. The SAS code written for processes should also be well commented. If there are ever any questions on how or why a task is performed or where the data is, it is necessary for the documentation to be able to answer these questions.

The Data Manager also keeps a problem database to track helpdesk calls or issues from other sources.The project data dictionary must also be kept up-to-date, as this is used by many within and outside of SRO.

Much of our communication is through email. It is standard practice to maintain a project email folder within your project email and to archive it to the project folder.

8.6.6 Maintain automated tasks

There are many tasks that Data Managers perform regularly. Once the task has been set up and has been thoroughly tested, it is efficient to put the task on the task scheduler. There are processing computers set aside to perform these tasks that are monitored by the DBA on duty. It is our best practice to use these processing computers rather than a personal computer to run the scheduled tasks.

Some common tasks that Data Managers perform are running Access queries or macros, SAS programs, and Blaise Manipula scripts. These tasks are set up via batch files on the processing computer and then run from the Task Scheduler program. The processing computers may have different versions of the programs that are on your personal computer and therefore, Data Managers should always check that a scheduled task runs on the scheduler.

Passwords are not to be written into batch files. If a password is needed to pass into an application, it can be included within the Scheduled Task. It is the Data Manager’s group responsibility to maintain passwords in the ODBC DSNs and the SAS autoexec used on the processing computers.

For further information and steps to set up scheduled tasks, see the Data Management Wiki.

8.6.7 Set up telephone study reporting fields

Most studies conducted in the Survey Services Lab (SSL) use a Computer-Assisted Telephone Interviewing (CATI) Sample Management System (SMS). SSL Project Leaders use SMS for their real-time reports and production monitoring. Generally, the project Data Manager is less actively involved in SSL projects, with major tasks including setting up the SMS-RT (SMS-Real Time) process, report archiving, and processing data for delivery.

SMS-RT transfers via an overnight process interviewing paradata from SMS to the SurveyTrak Reporting Database. This allows SSL Project Leaders to review reports on data collection effort that are similar to those generated for decentralized (field) studies. This paradata includes information in the SurveyTrak tSample_Line, and TCall_Information tables.

The Data Manager is responsible for setting up telephone real-time reporting standard and project-specific fields in SurveyTrak tables.

For further information and steps for setting up SMS-RT, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.7 Data Deliverables & Reports

There are many types of data that we collect; paradata, metadata, and interview data. Many different groups within and outside of SRO use these data. The Data Manager is responsible in providing data in a useful form specified by the project staff. Data delivery content and timeline are outlined by the project staff at the beginning of the project. Generally there is at least one interim data delivery and a final delivery. Some projects have outlined a weekly delivery timeline or monthly timeline. The format and the location to post the data are determined by the project staff. Standard data delivery is in ASCII, SAS, or SPSS format.

8.7.1 Extract survey data and create documentation

The standard method for preparing the interview datasets is to use the Blaise to SAS application in MQDS. This process reads the Blaise files into eXtended Markup Language (xml) and then outputs a series of SAS and text files: main data file, a data dictionary with variable information, and open-end responses, as well as the interviewer comments (Blaise remarks) from the questionnaire. MQDS is also used to create codebooks with frequencies in the dataset. MQDS and its documentation are available here: M:\SROapplications\MQDS\V4\

Another method for transforming the Blaise data into SAS is through a Blaise Chameleon Script. NSFG Cycle 8 is currently using this method. SRO uses a variety of systems for web survey documentation.

Before running any process with the survey master files, copy the necessary files to the data management (DataOps) project folder and run the processes using the copies.

For further information on extracting survey data from Illume or with MQDS, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.7.2 Extract coded data

Many studies have a few questions that are not coded by the interviewer at the time of the interview or that have paper Self-Administered Questionnaire (SAQ) supplements with open-ended questions. Often these types of questions are coded by the coding staff in the Survey Services Lab (SSL) after data collection. The Data Manager is responsible for providing the coders with the Blaise data (the master.bdb) along with the supporting datamodel files. After data are coded, the Data Manager then retrieves the coded Blaise data from a specified output folder and copies the files to the data management project folder to process the same way as described for delivering other survey data with MQDS.

8.7.3 Clean and recode data

After the Blaise data is in SAS, the datasets need to be further manipulated or cleaned before delivery. The basic cleaning that Data Managers do for external clients includes:

Remove Personally Identifying Information (PII) from Blaise datasets
– Title, First Name, Middle Name, Last Name, Suffix
– Address, City, State, Zip
– Phone number and extension
Remove extraneous Household Rosters used for sortingRecode variables as needed (project-specific)
Create open-End dataset (project specific, can contain PII)
Create comments/remarks dataset (project-specific, can contain PII)

If further cleaning or manipulation of the datasets before delivery is required, this should be discussed with the Project Leader. Sometimes projects will request that some SurveyTrak or observation data are concatenated with the main data.

Some projects, typically Center projects like the Health and Retirement Study (HRS) and the Panel Study of Income Dynamics (PSID), have exceptions for stripping the PII for the datasets. This is not our standard and there must be an approved contractual exception with IRB approval to not strip the PII.

Projects will sometimes request the data files in a format other than SAS. This is easily accomplished by using the StatTransfer utility, available here: M:\StatTransfer10.

For further information on data cleaning, coding and delivery, see the Data Management Wiki.

8.7.4 Timing reports

For any project that produces a Blaise Audit Data Trail (ADT) as part of the paradata, Data Managers work in many ways with the ADT data. This includes extracting them to a specific folder in the merged data set deliverable files, analyzing them on a case-by-case basis for troubleshooting interview field problems, and analyzing them en masse for the creation of standardized timings reports.

Two tools currently exist that Data Managers use to produce timings reports for project staff. One of these tools, ATReport, is being gradually phased out as of 2012. The other tool, the ADT Database, is newer and allows more flexibility. Regardless of the tool that is used, the Data Manager works with the Project and Technical Leader to define what is needed in regards to interview timings reports. At a minimum, such reports should include the total interview time by case and by interviewer (also by block or section if applicable). The report should also provide an overall average interview time across all interviews, and should provide statistics for the distribution of timings (e.g., minimum, maximum, standard deviation, etc.).

For further information on timing reports, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.7.5 Tracking reports

Project staff on longitudinal or panel studies may also request reports on tracking efforts. A program supplementary to SurveyTrak, WebLog, is often used for project-specific tracking. There is SAS code that creates Excel reports from Weblog for project staff to review.

For further information on tracking reports, see the Data Management Wiki.

8.7.6 Ad hoc reports

It is common for Data Managers to receive requests for a wide variety of reports and summaries related to the data our organization collects (e.g., preload data, survey data, or paradata). They also receive requires requests for data related to a project that currently is not in production. Data Managers need to be available to assist in responding to these requests. However, it is standard practice to ask for:

The priority of the task;
A deadline, along with an “ideal” time to deliver the output; and
The format for output.

8.7.7 Perform quality checks on field progress report

Data Managers may be involved with the project’s field progress report (FPR) in several ways. At a minimum, the Data Manager should be aware of who programmed the FPR(s), how many related reports the project contains in addition to the FPR, the timeline for report lifecycle (i.e., when reports start and stop being generated), and the process of the archiving the reports. Ideally, the Data Manager will also be heavily involved in providing quality checks on the FPR during the testing phase of the project. This involves working with the programmer to validate calculated fields and groupings against the underlying data.

For smaller projects, and increasingly even for larger more established projects, the Data Manager may also be directly involved in programming the FPR and related reports using SAS. This has the advantage of reducing iterations between the programmer and Data Manager, as well as allowing more flexibility in report customization, however, project budgets and allocation must be adjusted for this option. For further information on reports, see the Data Management Wiki. Contact a Data Manager supervisor for access.

8.8 Other Systems Data Managers Use

Most of this chapter has focused on data management activities related to decentralized data collection studies and SurveyTrak and related system, with some references to centralized telephone studies and CATI SMS.

However, SRO often conducts studies that do not use our two major sample management systems. For example, web studies use specialized software (e.g., Illume). Data Managers for these web studies will need to download the Web data for quality control checks and delivery, as well as generate reports, using the Web software sample management system.

In addition, SRO frequently finds that study requirements cannot be met using our current systems, and new systems or subsystems are developed. In such studies, the Programmer and Data Manager activities may change when it comes to data delivery and reporting requirements. Data Managers will often be asked to produce daily reports in the nonstandard systems. When this is the case, the Data Manager needs to know about it early on in the project development phase.

The Data Manager should be thinking about the questions related to data output and analysis, and should be especially vigilant about the data dictionary for brand new systems. It is important to discuss the data delivery requirements with the Project Leader and Technical Leader, or in some cases, with the client directly. This includes the data format, variable labels, and value preferences. The Data Manager will often be communicating with the client about any issues they have with the data files, such as unexpected values, missing variables, etc.

The following link provides an extreme example the complexity of a Data Manager’s responsibilities on a project that required custom systems.

Example of a Project’s Nonstandard System Requirements

8.9 Project Closeout

As a project comes to the close of data collection, there are standard tasks that Data Managers perform. Some of these have been previously discussed, such as removing sample lines from laptops and delivering the datasets, while others are specific to finalizing and archiving the project. Projects do not usually end with the completion of data collection. There is a time lag as checks get sent out and cashed, paper materials are returned to Ann Arbor and logged, as well as getting the non-finalized lines recoded.

Once the Project Leader officially declares that the project is completed, then the Data Manager can close the project. The End of Project Checklist is helpful in confirming that the project has been closed and archived. This process can take months after data collection has been completed.

End of Project Checklist

As the project comes to an end, it is important to add to the project documentation a “Lessons Learned” section. This can be extremely useful when coming back to the project for another wave of data collection, or when looking to launch a similar project. The “Lessons Learned” should be reviewed when the new project is being budgeted.

Finally, time should be devoted to archiving project files that the Data Manager has used during production. The Data Manager should work with SRO’s Archivist and DBA’s to ensure that folders, files, and databases are being compressed, archived, and, if needed, deleted, when a project closes. In addition, Data Managers should try to anticipate what kind of future data requests may come, even potentially years down the road, and try to maintain a core of usable data files, consistent with SRO’s archiving protocols.

8.10 SRO Applications

There are both SRO and commercial tools that Data Managers use in their work. When a new tool is available or designed and implemented by SRO, typically a Data Manager is responsible for testing and documenting how the Data Management Team will interact with the new tool.

Here are some links to applications and user documentation commonly used by Data Managers.

STAdmin:

Exe: stadmin8.exe, in L:\groups\TSG\share\STAdmin.
User Doc: SurveytrakAdmin8_UserGuide.doc, in L:\groups\TSG\share\ST Admin\documentation.

Line Generator:

Exe: line_generator.exe, in L:\groups\TSG\share\Line Generator.
User Doc: LineGen Documentation, in L:\groups\TSG\share\Line Generator.

DRAS:

Exe: DailyReportArchiveSystem.exe, in L:\groups\TSG\share\DailyReportsArchiveSystem\Production.
User Doc: Daily Reports and How to Archive Them.doc, in: L:\groups\TSG\share\DailyReportsArchiveSystem\Documents.

ProfileLoader:

Exe: RProfile.exe, in L:\groups\TSG\share\Rprofiles (the .exe in this folder does not use Unicode).
User Doc: RProvileLoader.doc, in L:\groups\TSG\share\Rprofiles.

MQDS:

Exe: MQDS.exe, in M:\SROapplications\MQDS\V4\4.0.3.
User Doc: MQDS User Guide V4.doc, M:\SROapplications\MQDS\V4\4.0.3.

AT-Reports:

Exe: ATReport.exe, in M:\SROapplications\ATReport.
User Doc: User Documentation-2011.doc,in M:\SROapplications\ATReport\doc.

Interview Data Merge:

Exe: iw_data_merge.exe, in L:\groups\TSG\share\ST Iw Data Merge.
User Doc: several documents in L:\groups\TSG\share\ST Iw Data Merge\documentation.

STAT Transfer:

Exe: Installation file, StatTransfer.OK.msi, in M:\StatTransfer10.

8.11 Technical Support

ISR and SRC offer a wide variety of staff career development and staff training programs. Some of these include SAS training, Microsoft Office Suite trainings, and Human Resources and Development (HRD) classes.

http://sites.isr.umich.edu/DNN/Default.aspx?alias=sites.isr.umich.edu/dnn/srctraining

The University of Michigan has a contract with Safari Technical Books Online:
\\SRC-Douvan\SRO\groups\TSG\share\!_Helpful_Resources\UMs-SafariTechBooksOnlineContract.doc