The graduation and retention rates produced by the PTC will never quite match those published on OIRA's website.

This is mainly due to differences in cohort definition. The published numbers are based on two variables in the table HISTORY_FACTS:

To understand these codes (especially headcount), it is necessary to know how the data in HISTORY_FACTS are structured. A record in HISTORY_FACTS is defined by the unique combination of a student, college, and term. A student who went to BMCC for four semesters will have four records in HISTORY_FACTS. However, if a student takes classes in more than one college in a given semester, they will have more than one record for that term. These records are distinguished by headcount. Headcount is measured in each semester and indicates whether or not a record is for a student's "college of record." Headcount is either 1 or NULL.

New student code is measured when a student first enters a college and has the following possible values:

In order to create the published numbers, OIRA measures a cohort as the group of students who have a new student code of 1 for first-time freshman and 2 for transfer students. Enrollment numbers are the sum of headcounts in a term where new student code meets the criteria under study. This works well for published numbers, but there are a small number of cases where a student has multiple headcount = 1 records in a given semester (at different schools). In addition, there are cases where a student can have multiple records with the same new_student_code (in different terms). This might be less of an issue for transfer students but for first-time freshman, it is not ideal.The PTC deals with both of these issues in the process of extracting student records.

When we extract the data for the PTC, we include a Rank Over command. This ranking attempts to identify the single term/college record for a student which is (1) earliest ,and (2) is taking the most credits, along with some tie-breakers if there still isn't a unique record. Note that all students in the PTC are first-time freshman in the term the PTC identifies as their first.

This works well for the analytic purposes to which we put the PTC, but it means that we cannot always back into the published numbers for a couple of reasons related to the points above:
  1. The student has multiple records across the years in which they have NEW_STUDENT_CODE = 1 and the PTC assigns them to the earlier cohort while the published numbers count them in both years
  2. The student has multiple records in the same semester in which they have NEW_STUDENT_CODE = 1 and the PTC picks the college record where they attempted more credits while the published numbers count them in both schools

Each time we extract the PTC, we compare its graduation and retention rates to those published online. While there are differences, the numbers in the PTC are reasonably close to the published numbers for all rates except the later institutional associate rates. That is, the 6- and 8-year associate graduation rates for students from the same CUNY college that they entered are off by more than our tolerance (1%). The biggest difference is slightly over 5%. The vast majority of rates are well under 1% different in the PTC from the published numbers.