← Back to TTAS

Data Lineage

Per-column provenance classification for the Tulsa Housing Manifold

Each of the 26 columns in the TTAS manifold is classified into one of five data natures. This page documents where every value comes from, how it was transformed, and the confidence level of its provenance.

Observed Calibrated Derived Modeled Synthetic

Data Sources

FRED (Federal Reserve Economic Data)

Endpoint: api.stlouisfed.org/fred/series/observations

Pulled: 2026-05-14

Nature: Observed

Confidence: High (0.90–0.95)

953 rows, 1947–2026. Mortgage rate series is direct; CPI and unemployment are national-level context variables.

Fields: mortgage_rate_30y, cpi_all_urban, unemployment_rate, median_sales_price_us

Census ACS 5-Year Estimates

Endpoint: api.census.gov/data/2023/acs/acs5

Pulled: 2026-05-14

Nature: Observed

Confidence: Medium-High (0.85)

24 Tulsa ZIPs returned with null metric values. ACS 5-year estimates have 1-year lag; 2023 estimates reflect 2019–2023 data.

Fields: median_household_income, owner_occupied_share, median_home_value, rent_burden_count

Realtor.com Research Data

Endpoint: realtor.com/research/data (CSV download, not API)

Pulled: 2026-05-14

Nature: Observed (aggregates) / Calibrated (per-property)

Confidence: Medium-High (0.70–0.85)

Tulsa metro data, monthly. Listing prices and rents are observed; per-property values are calibrated synthetic. Data extends 2016–2026; pre-2016 values in timeseries are interpolated.

Fields: median_listing_price, median_rent, active_listing_count, median_days_on_market

OSMnx Street Network + POI

Endpoint: OpenStreetMap via osmnx Python library (on-demand)

Pulled: n/a (opt-in)

Nature: Observed (when enabled) / Synthetic (default)

Confidence: Medium (0.55) when enabled; Low (0.40) when synthetic

When enabled, replaces synthetic centrality and amenity with real OSM data. Not enabled by default due to network dependency.

Fields: street_centrality, amenity_density

Tulsa County Assessor / City of Tulsa Open Data

Endpoint: planned — not yet integrated

Pulled: n/a

Nature: Planned (Observed)

Confidence: n/a

Would provide parcel-level ground truth for property values and transaction dates. Currently not integrated.

Fields: parcel_values, property_class, sale_dates

Per-Column Provenance

ColumnNaturePrimary SourceConfidenceTransformations
median_listing_price Calibrated Realtor.com Research Data 70% synthetic_generation → real_aggregate_calibration → monthly_median_scaling_to_realtor_com
inventory_velocity Calibrated Realtor.com Research Data (days on market) 70% synthetic_generation → dom_to_velocity_conversion → real_aggregate_calibration
monthly_rent_estimate Calibrated Realtor.com Research Data (rent) + synthetic rent-to-price model 65% synthetic_rent_to_price_ratio → real_aggregate_calibration → monthly_median_scaling
annual_income_estimate Calibrated Census ACS (ZIP median income) + synthetic per-property noise 65% synthetic_generation → zip_income_calibration → property_noise
ownership_cost_monthly Derived Computed from median_listing_price, mortgage_rate_30y, property_tax_rate 80% mortgage_annuity_formula → tax_and_insurance_estimate
rent_margin_monthly Derived Computed from annual_income_estimate, dti_max, monthly_rent_estimate 75% budget_less_rent
buy_margin_monthly Derived Computed from annual_income_estimate, dti_max, ownership_cost_monthly 75% budget_less_ownership_cost
affordability_index Derived Computed in preprocess.py — ownership_cost / max_affordable_payment 75% affordability_gap_normalization
rent_to_price_ratio Derived Computed from monthly_rent_estimate / median_listing_price 70% annualized_rent_to_price_ratio
rent_vs_buy Modeled Decision rule from buy_margin, rent_margin, opportunity score 60% threshold_rule → buy_margin_gt_-0.10*rent → opportunity_gt_0.42
regime_hint Modeled Deterministic regime classifier based on month number 55% month_index_threshold: <24 Stable, <57 Overheated, <72 Rate Shock, >=72 Opportunity
mortgage_rate_30y Observed FRED MORTGAGE30US 95% api_fetch → monthly_resample → interpolate_gaps
cpi_all_urban Observed FRED CPIAUCSL 95% api_fetch → monthly_resample
unemployment_rate Observed FRED UNRATE 95% api_fetch → monthly_resample
median_sales_price_us Observed FRED MSPUS 90% api_fetch → quarterly_to_monthly_interpolation
median_household_income Observed Census ACS 5-Year Estimates (Table S1901) 85% api_fetch → zip_code_join
owner_occupied_share Observed Census ACS 5-Year Estimates (Table S2502) 85% api_fetch → zip_code_join
median_home_value Observed Census ACS 5-Year Estimates (Table S2502) 85% api_fetch → zip_code_join
school_rating Synthetic Tulsa ZIP profiles (hardcoded anchors in fetch_data.py) 40% profile_lookup → micro_location_noise → clip_0_1
street_centrality Synthetic Tulsa ZIP profiles (hardcoded anchors) 40% profile_lookup → noise → clip_0_1
amenity_density Synthetic Tulsa ZIP profiles (hardcoded anchors) 40% profile_lookup → seasonal_adjustment → noise → clip_0_1
crime_index Synthetic Tulsa ZIP profiles (hardcoded anchors) 35% profile_lookup → trend_adjustment → noise → clip_0_1
flood_risk_score Synthetic Tulsa ZIP profiles (hardcoded anchors) 35% profile_lookup → seasonal_cos → noise → clip_0_1
walk_transit_score Synthetic Computed from synthetic walk and transit components 35% weighted_combination → clip_0_1
economic_mobility_index Synthetic Tulsa ZIP profiles (hardcoded anchors) 35% profile_lookup → trend_adjustment → noise → clip_0_1
dti_max Synthetic Beta distribution draw (a=2.2, b=3.0), scaled to [0.28, 0.47] 30% beta_sample → scale_to_range → clip