Per-column provenance classification for the Tulsa Housing Manifold
Each of the 26 columns in the TTAS manifold is classified into one of five data natures. This page documents where every value comes from, how it was transformed, and the confidence level of its provenance.
Fields: mortgage_rate_30y, cpi_all_urban, unemployment_rate, median_sales_price_us
Fields: median_household_income, owner_occupied_share, median_home_value, rent_burden_count
Fields: median_listing_price, median_rent, active_listing_count, median_days_on_market
Fields: street_centrality, amenity_density
Fields: parcel_values, property_class, sale_dates
| Column | Nature | Primary Source | Confidence | Transformations |
|---|---|---|---|---|
| median_listing_price | Calibrated | Realtor.com Research Data | 70% | synthetic_generation → real_aggregate_calibration → monthly_median_scaling_to_realtor_com |
| inventory_velocity | Calibrated | Realtor.com Research Data (days on market) | 70% | synthetic_generation → dom_to_velocity_conversion → real_aggregate_calibration |
| monthly_rent_estimate | Calibrated | Realtor.com Research Data (rent) + synthetic rent-to-price model | 65% | synthetic_rent_to_price_ratio → real_aggregate_calibration → monthly_median_scaling |
| annual_income_estimate | Calibrated | Census ACS (ZIP median income) + synthetic per-property noise | 65% | synthetic_generation → zip_income_calibration → property_noise |
| ownership_cost_monthly | Derived | Computed from median_listing_price, mortgage_rate_30y, property_tax_rate | 80% | mortgage_annuity_formula → tax_and_insurance_estimate |
| rent_margin_monthly | Derived | Computed from annual_income_estimate, dti_max, monthly_rent_estimate | 75% | budget_less_rent |
| buy_margin_monthly | Derived | Computed from annual_income_estimate, dti_max, ownership_cost_monthly | 75% | budget_less_ownership_cost |
| affordability_index | Derived | Computed in preprocess.py — ownership_cost / max_affordable_payment | 75% | affordability_gap_normalization |
| rent_to_price_ratio | Derived | Computed from monthly_rent_estimate / median_listing_price | 70% | annualized_rent_to_price_ratio |
| rent_vs_buy | Modeled | Decision rule from buy_margin, rent_margin, opportunity score | 60% | threshold_rule → buy_margin_gt_-0.10*rent → opportunity_gt_0.42 |
| regime_hint | Modeled | Deterministic regime classifier based on month number | 55% | month_index_threshold: <24 Stable, <57 Overheated, <72 Rate Shock, >=72 Opportunity |
| mortgage_rate_30y | Observed | FRED MORTGAGE30US | 95% | api_fetch → monthly_resample → interpolate_gaps |
| cpi_all_urban | Observed | FRED CPIAUCSL | 95% | api_fetch → monthly_resample |
| unemployment_rate | Observed | FRED UNRATE | 95% | api_fetch → monthly_resample |
| median_sales_price_us | Observed | FRED MSPUS | 90% | api_fetch → quarterly_to_monthly_interpolation |
| median_household_income | Observed | Census ACS 5-Year Estimates (Table S1901) | 85% | api_fetch → zip_code_join |
| owner_occupied_share | Observed | Census ACS 5-Year Estimates (Table S2502) | 85% | api_fetch → zip_code_join |
| median_home_value | Observed | Census ACS 5-Year Estimates (Table S2502) | 85% | api_fetch → zip_code_join |
| school_rating | Synthetic | Tulsa ZIP profiles (hardcoded anchors in fetch_data.py) | 40% | profile_lookup → micro_location_noise → clip_0_1 |
| street_centrality | Synthetic | Tulsa ZIP profiles (hardcoded anchors) | 40% | profile_lookup → noise → clip_0_1 |
| amenity_density | Synthetic | Tulsa ZIP profiles (hardcoded anchors) | 40% | profile_lookup → seasonal_adjustment → noise → clip_0_1 |
| crime_index | Synthetic | Tulsa ZIP profiles (hardcoded anchors) | 35% | profile_lookup → trend_adjustment → noise → clip_0_1 |
| flood_risk_score | Synthetic | Tulsa ZIP profiles (hardcoded anchors) | 35% | profile_lookup → seasonal_cos → noise → clip_0_1 |
| walk_transit_score | Synthetic | Computed from synthetic walk and transit components | 35% | weighted_combination → clip_0_1 |
| economic_mobility_index | Synthetic | Tulsa ZIP profiles (hardcoded anchors) | 35% | profile_lookup → trend_adjustment → noise → clip_0_1 |
| dti_max | Synthetic | Beta distribution draw (a=2.2, b=3.0), scaled to [0.28, 0.47] | 30% | beta_sample → scale_to_range → clip |