alumni_lookup

Phase 18: Alumni Education Data Model Migration

Status: Complete (18.1-18.9 shipped; released in v1.0.65). Degree-model deprecation + dropping the legacy degrees table are intentionally held for a separate post-import cleanup tag once Education coverage ≥ 99% — colleges/majors are retained as active reference tables.
Priority: High
Estimated Sub-Phases: 9


Completion Summary (18.2)

Implemented

Files Created (18.2)

Tests

Deferrals


Overview

Migrate from the current academic structure:

To a new, person-centered model:

This phase introduces a richer representation of educational history while preserving existing downstream integrations during rollout.

Proposed Target Structure

Education (0..n per Alumni)

EducationAreaOfStudy (0..n per Education)


Why This Phase Exists

The current Degree -> Major -> College model assumes a single major and current-college context attached to each degree row. It is insufficient for:


Impact Inventory (Current Codebase)

The following systems are directly impacted because they query degrees, majors, and/or colleges:

1. Import and data management

2. Lookup Portal search and profile display

3. Degree stats and engagement analytics

4. Alumni Portal (Cp::*) profile, community, and recommendation logic

5. Exports, check-in tools, and batch utilities

6. Lookup API and downstream contract dependencies

7. Reference data admin and configuration

8. News, content, and seeded questions

9. Routes and URL surface


Migration Strategy (High-Level)

Contract-first and backwards-compatible

Phase 18 is designed so downstream consumers can continue reading existing API fields while internals migrate.

  1. Add new tables and models
  2. Dual-write from import pipelines into old + new structures
  3. Import full datasets from new awarded-degrees and areas-of-study exports (no legacy backfill)
  4. Pre-seed and maintain school reference data from the granted-school source list before first import
  5. Introduce a compatibility presenter/service to produce legacy fields (ug_*, gr_*, pref_college) with per-record fallback to legacy degrees when educations are missing
  6. Migrate reads across controllers/services to the presenter/service
  7. Track migration coverage and cut over fully only after agreed completeness threshold
  8. Keep API payload shape unchanged until explicit deprecation window is approved
  9. Optionally retire legacy degree-major paths in a later phase

Sub-Phase Plan

Phase 18.1: Data Contract, Mapping Rules, and Resolved Decisions ✅

Goal: Finalize non-ambiguous mapping and data ownership before schema work.

Deliverables

Resolved Decisions

# Topic Decision Rationale
1.1 Source feed shape Two CSVs: awarded-degrees + areas-of-study, joined via Education: Education natural key from source Matches actual exports the data team will provide
1.2 Legacy degrees/majors/colleges fate Plan toward decommission. Phase 18 keeps them readable for transition, Phase 19+ retires after dependents migrate Data team is maintaining the new model going forward
1.3 Backfill strategy No backfill. New exports include all records. Missing/mismatched data flows into a gap report so the CRM team can fix source Source-of-truth is the CRM; gaps are signals, not data we should fabricate
1.4 Granting school setup Pre-seed all unique granting-school codes/names from source list before 18.3 import runs Avoids null school-code mappings for known schools and supports icons/short names immediately
2.1 degree_level storage Hybrid: physical column auto-derived from degree_code on save Indexable for stats joins; never drifts because regenerated from canonical field
2.2 granting_school vs current_school Both stored as *_name (free text from source, never lossy) + *_code (resolved via lookup, nullable when no match) on the same educations row Lets us preserve historical attribution AND join to the colleges table when known
2.3 department_name storage Free text on Education for Phase 18. No departments reference table yet. Reserve normalization for a follow-up phase Source provides free text; existing majors.dept_desc usage is small (4 sites) and routes through compatibility presenter
2.4 concentration_level enum major, minor, concentration (lowercased on ingest) Matches source exactly
3.1 API V1 contract (c) additive: keep all current fields stable; add optional educations: [...] block behind ?include=educations opt-in Zero-break for downstream apps; new richness is opt-in
3.2 UG/GR derivation Most recent Education by date_issued per level, focused on concentration_level: "major" areas of study Simple, deterministic, matches user intent
3.3 Cp::Community college qualification Match on current_school_code only Aligns with how the CRM frames the “where it lives now” school
3.4 “Same major” recommendation Match on normalized area_of_study_name where concentration_level: "major" Future-proof against legacy major_code rot
3.5 Compatibility fallback If an alum has zero educations, presenter falls back to legacy degrees for all legacy fields during migration window Prevents empty API/profile output when source coverage is temporarily incomplete
3.6 Coverage tracking Add migration coverage stats and make full cutover contingent on threshold Provides objective readiness signal and protects downstream behavior
4.1 Current-student fields Stay on Alumni unchanged. No Education row created until graduation Educations represent awarded degrees, not in-progress study
4.2 Career Clusters / Welcome Packs YAML Phase 18: unchanged. Compatibility presenter exposes legacy keys (primary_major_desc, primary_college_code) so YAMLs keep working. Phase 19 backlog: rethink YAMLs against new vocabulary Keeps Phase 18 finite; data won’t structurally change again so YAML refactor is safe to defer
4.3 majors/colleges admin pages Hide/remove from Settings nav after Phase 18.7 They become legacy reference data; data team owns updates via the new feeds
4.4 Phase 18 scope All Impact Inventory items in scope. Single-shot transition. No carve-outs into Phase 19 except YAML refactor (4.2) User wants one decisive transition

Field Mapping Contract

Source: awarded-degrees.csveducations

Source column Target column Type Notes
Contact: BUID educations.buid string, NOT NULL, indexed FK-style to alumni.buid
Contact: BruinQuest - Contact ID (matching only, not stored) Used for alumni resolution / gap report
Education: Education educations.source_education_id string, NOT NULL Natural key from CRM. Anchors idempotency + areas-of-study join
Degree Code educations.degree_code string, NOT NULL e.g., BBA, BS, MA
(derived) educations.degree_level string, indexed undergraduate / masters / doctorate / unknown, auto-set from degree_code on save
Granting School educations.granting_school_name string, NOT NULL Preserved verbatim
(resolved) educations.granting_school_code string, nullable, indexed Lookup against colleges.college_name + alias map; nullable on miss
Current School educations.current_school_name string, NOT NULL Preserved verbatim
(resolved) educations.current_school_code string, nullable, indexed Lookup as above
Department educations.department_name string, nullable Free text; populates legacy ug_program/gr_program API field
Date Issued educations.date_issued date, indexed  
Institutional Unit (ignored in Phase 18) Frequently blank; redundant with Current School in samples. Revisit if data team confirms semantics
Institutional Units (ignored in Phase 18) Duplicate of Institutional Unit

Uniqueness: (buid, source_education_id) unique index → idempotent reimports.

Source: areas-of-study.csveducation_areas_of_study

Filter: only ingest rows where Degree Includes this Concentration? = 1.

Source column Target column Type Notes
Education: Education (join key) Resolves to educations.id via source_education_id
Area of Study: Area of Study Name (col 1, e.g. AS-196770) education_areas_of_study.person_area_of_study_id string, unique Natural key for idempotency
Area of Study: Area of Study Name (col 2, e.g. Business Administration) education_areas_of_study.area_of_study_name string, NOT NULL Free text, preserved verbatim
(derived) education_areas_of_study.area_of_study_name_normalized string, indexed Lowercased + trimmed; powers “same major” matching and future grouping
Concentration Level education_areas_of_study.concentration_level string enum major / minor / concentration (lowercased)
Current Institutional Unit education_areas_of_study.current_institutional_unit_name string, nullable Free text
(resolved) education_areas_of_study.current_institutional_unit_code string, nullable, indexed Lookup against colleges.college_name

School Name Resolution

A short alias map handles known mismatches between source naming and colleges.college_name. Known examples from samples:

Source name Maps to colleges.college_code
Jack C. Massey College of Business CB
College of Business (granting, legacy) CB
Mike Curb College of Entertainment and Music Business CE
College of Entrmnt/Musc Busnes (legacy) CE
College of Sciences and Mathematics CM
College of Sciences & Math (legacy) CM
College of Music and Performing Arts MP
College of Vis/Performing Arts (legacy) VP
College of Pharmacy & Health Sciences PH
University College UC

In addition to alias mapping, pre-seed the school reference list from source before migration imports. Source codes provided:

OM, WC, CA, CB, ED, CE, CH, CL, CS, MC, MP, CN, CP, PH, CM, CT, CV, CI, CR, HO, HU, MU, NU, RE, SC, SM, 00, UC, WA.

This pre-seed step should include canonical long name + short name + icon metadata where available so profile and stats surfaces are ready at cutover.

The importer flags these in a gap report (no row inserted into colleges; Education row inserted with *_name populated and *_code NULL). Staff/data team triage from the gap report.

API V1 Compatibility Contract (LOCKED)

These keys in Api::V1::AlumniSearchController#serialize_alumni MUST remain present and semantically equivalent post-Phase 18. Internal data source migrates to educations via a presenter; downstream consumers see no change.

Frozen keys: buid, contact_id, first_name, last_name, email, phone_number, pref_college, ug_college, ug_program, ug_degree, ug_graduation_year, ug_college_desc, gr_college, gr_program, gr_degree, gr_graduation_year, gr_college_desc, current_student, current_school, current_school_desc, current_program, intended_degree, expected_graduation_year, student_status, district, district_code, is_faculty, is_staff, category, company, position.

Derivation rules (presenter):

Coverage and Cutover Gate (LOCKED)

Track migration readiness continuously during 18.3-18.5:

Cutover expectation:

Additive (opt-in ?include=educations): Full educations: [...] array with nested areas_of_study per education.

Sample Source Data (committed for reference)

Sample CSVs reviewed during 18.1 are now committed in docs/planning/champion-portal/phases/phase-18/samples/:

These are the baseline input files for 18.3 import preview/commit tests and should remain stable unless the source extract shape changes.

Phase 18.2: Schema Foundation and Models ✅

Goal: Add new education schema with safe coexistence alongside legacy tables.

Deliverables

What Was Implemented

Phase 18.3: Import Pipeline Migration (Dual-Write)

Status: Complete

Goal: Stand up a new-CSV import pipeline (awarded-degrees + areas-of-study) that writes to educations and education_areas_of_study with idempotent upserts and a downloadable gap report. Imports run as background jobs with a polled status page so production uploads cannot time out on Heroku.

Deliverables

Spec Deviation

Files Created (18.3)

Files Modified (18.3)

Flow

  1. Admin uploads awarded-degrees.csv (or areas-of-study.csv) via the form on settings/alumni/upload_educations.
  2. Controller creates an EducationImportBatch (Zlib-compressed CSV in csv_content) and enqueues EducationImportScanJob.
  3. User is redirected to settings/alumni/education_import_batches/:id. The status page polls every 5 seconds.
  4. Scan job inflates the CSV to a Tempfile, runs the importer’s preview, stores the manifest via EducationImportManifestStore (Zlib-compressed JSON in manifest_data), builds the gap CSV (gap_csv_data), and transitions the batch to scanned.
  5. The status page now shows the preview table + commit button. Clicking commit enqueues EducationImportApplyJob.
  6. Apply job loads the manifest, calls the importer’s commit, transitions the batch to completed, and clears csv_content + manifest_data (gap CSV is preserved for the audit trail).

Tests

Phase 18.4: Read-Path Compatibility Layer

Goal: Centralize education derivation so old and new data can be served consistently.

Status: ✅ Complete

Deliverables

What Was Implemented (18.4)

Deferred (carries to BACKLOG)

Tests

Phase 18.5: Lookup API Backwards Compatibility

Goal: Keep API response contract stable while reading from the new model.

Deliverables

What Was Implemented (18.5)

Deferred (carries to BACKLOG)

Tests

Phase 18.6: Stats & Engagement Aggregations Migration

Goal: Move all stats-page aggregation logic from degrees → majors → colleges joins to a shared Education::AggregateScope query object that is Education-first with Degree fallback per BUID. Includes a side-by-side ?source=legacy|education toggle so staff can validate parity before legacy is removed.

Deliverables

Explicitly Deferred

Phase 18.7: CSV Exporters, Filter Service, and Model Scope Cleanup

Goal: Migrate the data-export surfaces (CSVs, CRM converters) from degrees → majors → colleges to the Education::AggregateScope + Alumni::EducationProfile stack established in 18.6. Stats pages are already cut over by this point; this phase is about contracts that flow to downstream consumers (Advancement Services CSVs, CRM event mapping).

Deliverables

Explicitly Deferred

Phase 18.8: Current Student Data Consolidation (Educations as Source of Truth)

Goal: Move per-enrollment current-student data (school, program, intended degree, expected graduation year, per-record student status) out of alumni denormalized columns and into the educations table where it naturally belongs. Retire the separate Csv::CurrentStudentImporter in favor of a single education-feed pipeline.

Why: Today an alumni who completed a BBA and is enrolled in an MBA has two educations rows but only one alumni.student_status column. The single-column model can’t represent “awarded + currently enrolled” simultaneously. Worse, the separate current-student CSV importer writes denormalized fields to alumni that get overwritten or go stale relative to the education records. Moving per-enrollment data to educations makes each row self-describing and eliminates a whole class of stale-data bugs.

Schema Changes

Add to educations:

Remove from alumni (after all read paths migrated):

Per-Education Student Status Rules (Ratchet)

When processing each education row, update alumni.student_status per this table:

Row student_status date_issued Effect on alumni.student_status
awarded present Always upgrade to awarded
awarded NULL Always upgrade to awarded
pending NULL Set pending only if not already awarded
withdrawn NULL Set withdrawn only if not already awarded

The contact importer’s student_status write path follows the same rule (never downgrades awarded).

Deliverables

Build Order (Safe / Additive-First)

  1. Migration: add new columns to educations (purely additive)
  2. Education importer changes (write to new columns + ratchet)
  3. Contact importer one-way ratchet
  4. Model + display layer migration (read from educations)
  5. View updates (show + search)
  6. Retire current_student_importer + UI
  7. Separate deploy: migration to drop 4 columns from alumni

Explicitly Deferred

Completion Summary (Deploy 1 of 2)

Phase 18.9: UI Rollout, Legacy Decommission, and Cleanup

Goal: Complete UI migration, retire legacy dependencies, and ship the public-facing UX changes (data-source filter, search columns) that depend on the new model being fully primary.

Decision (freeze, not drop): Per direction, Phase 18.9 freezes the legacy degrees table — all read paths move to the Education model exclusively, but the Alumni::EducationProfile→degrees fallback and the physical degrees table remain until a later cleanup tag once Education coverage is ≥ 99%. colleges and majors are retained as active reference tables (colleges is the canonical college-code→label source; majors backs the major dropdown, community naming, and banner-import validation) and are not slated for removal. Work is grouped: B (remove toggle/banner/source plumbing), C (migrate per-record + filter reads), A (profile/areas-of-study display polish), D (freeze importer write paths).

Completion Summary — Groups B + C (complete)

Completion Summary — Groups A + D (complete)

Completion Summary — Major filter migrated off Degree (complete)

Deliverables


Testing Strategy


Risks and Mitigations

  1. Silent behavior drift in UG/GR derivation
    • Mitigation: shared compatibility presenter with contract tests
  2. Duplicate/partial data from dual-write period
    • Mitigation: uniqueness constraints + idempotent import logic
  3. API downstream breakage
    • Mitigation: preserve V1 fields, stage V2 enrichments separately
  4. Reporting performance regressions
    • Mitigation: early indexing and query benchmarking on realistic datasets
  5. Historical college mapping ambiguities
    • Mitigation: allow both code and name fields, preserve source text

Dependencies and Ordering


Documentation Updates Required During Implementation


Planning Checkpoint (Required Before 18.1 Implementation)

Before implementation starts, complete the Sub-Phase Planning Checkpoint: