Skip to main content
Back to projects

Parcoursup Simulator

October 2025 Solo project
WebData ScienceSimulation
Live demo Private repository
Parcoursup Simulator

Tool for estimating admission chances on Parcoursup, based on open data and explainable statistical modeling. The application combines a FastAPI backend and a React interface to provide an interactive and transparent experience.

1. Data Collection

a) Sources and Input Files

  • Parcoursup Open Data – Statistics: fr-esr-parcoursup.csv Numbers, honors mentions, rank of last accepted, distribution by baccalaureate, scholarship holders, etc.
  • Training Cartography: fr-esr-cartographie_formations_parcoursup.csv Metadata (institution type, apprenticeship, boarding, geolocation, links).
  • Specialties (General Baccalaureate Holders): fr-esr-parcoursup-enseignements-de-specialite-bacheliers-generaux.csv Specialty pairs and admitted numbers.
  • Geographical Supplements: departements-france.csv Department code normalization.

Geolocation fields are kept for display/diagnostic purposes, with no direct effect on the model (except for academic aggregations).

b) Enrichments through Scraping

  • Parcoursup Sheets (Training Programs) Retrieval: presentation, expectations (in % when available), baccalaureate distribution, milestones (candidates/admitted/admitted), tips, competition detection.
  • “Competition” Consolidation competition_label labeling and, if available, dossier_coeff/competition_coeff coefficients.
  • High School Sheets (L’Étudiant) Five “key figures” metrics (success rate, honors mentions, number of students, final year enrollment, grade/20) with anti-blocking strategies (retries, throttling, UA rotation).

2. Data Processing

Goal: produce “model-ready” tables through schema alignments, typing, imputations, and derived indicators.

a) Cleaning and Merging — Training Programs

  1. Derived Indicators

    • Average by honors mentions (weights 11/13/15/17/19).
    • Distance to scale [8, 20] and proxy dispersion (from honors distribution).
    • Gender bias BfB_f and scholarship bias BbB_b, bounded/normalized.
    • Selectivity S=1Rd/NS = 1 - R_d/N in [0,1][0,1] (with filling if needed).
  2. 2025 Cartography & Geography

    Normalizations (apprenticeship → boolean, categorized boarding, types), renaming, department and city completion, join with department table.

  3. Training Family Harmonization

    Exhaustive formation_type → formation mapping (e.g., “Selective License” → “License – STS”), coverage checks.

  4. Adding “Competition”

    Adding competition and dossier_coeff, competition_coeff from training programs present in certain competitions (Sésame, Accès, Geipi-Polytech, Avenir, Advance, Puissance Alpha, IEP-Sciences Po).

b) Specialties

  1. Open Data

    Filtering by year, pairs (doublets) → stable IDs (1..13), calculation of admitted_specialty_share per training program, intra-training ranking.

  2. Scraping

    combination columns → doublet_i in id1,id2 format (via specialite2id).

c) High Schools

  1. Robust Scraping

    Incremental enrichment, typing, anti-blocking.

  2. Feature Engineering & Multiplier

    Decimals (success, honors), metric x[0,1]x \in [0,1], infos indicator.

    Bounded transformation into Mhighschool[0.85,1.15]M_{highschool} \in [0.85, 1.15] via z-score → CDF → exponential; if infos=0 then Mhighschool=1.0M_{highschool} = 1.0.

    Figure 1: Data Processing Pipeline

    Figure 1: Data Processing Pipeline

3. Establishing Metrics

Idea: convert each factor into a multiplier MiM_i centered at 1, bounded to avoid extremes, then aggregate.

  • Demographics (bounded, symmetric)

    Mgender=exp ⁣(kmax(2Bf1)s)M_{\text{gender}} = \exp\!\big(k_{\max}\,(2B_f - 1)\cdot s\big) Mscholarship=exp ⁣(kmax(2Bb1)t)M_{\text{scholarship}} = \exp\!\big(k_{\max}\,(2B_b - 1)\cdot t\big)

    with Bf,Bb[0,1]B_f, B_b \in [0,1], s,t{1,+1}s,t \in \{-1,+1\}, kmax=ln(1.025)k_{\max}=\ln(1.025) (~±2.5%).

  • General Grades (calibrated to cohort)

    Let gg be the candidate’s average, μ,σ\mu,\sigma those of the training program (from honors).

    z=gμσ,p=Φ(z)z=\dfrac{g-\mu}{\sigma}, p=\Phi(z).

    Mgrades={exp ⁣(1.25ln(mmax)(2p1))if g<μexp ⁣(ln(mmax)(2p1))otherwiseM_{\text{grades}}=\begin{cases} \exp\!\big(1.25\,\ln(m_{\max})(2p-1)\big) & \text{if } g<\mu \\ \exp\!\big(\ln(m_{\max})(2p-1)\big) & \text{otherwise} \end{cases}

    with mmax=1.5m_{\max}=1.5

  • Specialty Grades (consistency)

    n=(s1+s2)/2μ5n=\dfrac{(s_1+s_2)/2 - \mu}{5} truncated to [1,1][-1,1].

    Mspecialty-grades={exp ⁣(2.5ln(1.5)n)if n<0exp ⁣(ln(1.5)n)otherwiseM_{\text{specialty-grades}}=\begin{cases} \exp\!\big(2.5\,\ln(1.5)\,n\big) & \text{if } n<0 \\ \exp\!\big(\ln(1.5)\,n\big) & \text{otherwise} \end{cases}
  • Baccalaureate Type (representativeness)

    For non-general baccalaureates: floor mmin=0.6m_{\min}=0.6 according to observed share pbp_b.

    Mbac={0if pb0min(max(3pb,mmin),1)otherwiseM_{\text{bac}}=\begin{cases} 0 & \text{if } p_b \le 0 \\ \min\big(\max(3p_b, m_{\min}), 1\big) & \text{otherwise} \end{cases}
  • Specialty Pair (adequacy/rarity)

    With pspep_{\text{spe}} the share of admitted students with the pair and pˉtop\bar p_{\text{top}} the average of the NN most frequent pairs (often N=3N=3), set r=min ⁣(pspe/pˉtop,1)r=\min\!\big(p_{\text{spe}}/\bar p_{\text{top}},\,1\big).

    Mdoublet={exp ⁣(2ln(1.15)(2r1))if r<0.5exp ⁣(ln(1.15)(2r1))otherwiseM_{\text{doublet}}=\begin{cases} \exp\!\big(2\,\ln(1.15)(2r-1)\big) & \text{if } r<0.5 \\ \exp\!\big(\ln(1.15)(2r-1)\big) & \text{otherwise} \end{cases}
  • High School of Origin (academic context)

    Aggregated score x[0,1]x \in [0,1]z=xμxσxz=\dfrac{x-\mu_x}{\sigma_x}, p=Φ(z)p=\Phi(z).

    Mhighschool=clamp ⁣(exp(k(2p1)),mmin,mmax)M_{\text{highschool}}=\operatorname{clamp}\!\big(\exp(k(2p-1)),\, m_{\min},\, m_{\max}\big)

    with μx=0.548\mu_x=0.548, σx=0.182\sigma_x=0.182, mmin=0.85m_{\min}=0.85, k=ln(mmax)k=\ln(m_{\max}).

    If infos=0 then Mhighschool=1M_{\text{highschool}}=1.

4. Chosen Model

a) Aggregated Score

The global score for a training program ff is:

M=MgenderMscholarshipMbacMgradesMspecialty-gradesMdoubletMhighschoolM = M_{\text{gender}} \cdot M_{\text{scholarship}} \cdot M_{\text{bac}} \cdot M_{\text{grades}} \cdot M_{\text{specialty-grades}} \cdot M_{\text{doublet}} \cdot M_{\text{highschool}}

All MiM_i are capped (bounds) to remain stable and interpretable.

b) Conversion to Calibrated Percentile

We assume MM is centered around 1 and choose σM\sigma_M such that the 97.5th percentile corresponds to:

σM=Mmax1Φ1(0.975)\sigma_M=\dfrac{M_{\max}-1}{\Phi^{-1}(0.975)}, zM=M1σMz_M=\dfrac{M-1}{\sigma_M}, P=Φ(zM)[0,1]P=\Phi(z_M)\in[0,1].

The displayed value is 100×P100\times P.

c) Three-Level Decision

  • Rejected if 100P<20100P<20
  • Waiting List if 20100P<5020\le 100P<50
  • Accepted if 100P50100P\ge 50

Simple thresholds adjustable by training/year for local calibration refinement.

d) Explainability

In addition to PP, the service returns the main factor (dominant MiM_i component) to explain the result (grades, doublet, baccalaureate type, high school, etc.). Demographic effects are bounded and symmetric.

e) “Competition” Case (optional)

If competition = 1, a post-dossier weighting is applied from a user competition grade and dossier_coeff/competition_coeff coefficients (when available).

Example: M1.440100P71.8M \simeq 1.440 \Rightarrow 100P \simeq 71.8 → Accepted.

5. Backend Architecture

  • Framework: FastAPI (main.py) Config via core/config.py, middlewares (CORS, UserIdMiddleware), dependencies core/deps.py.
  • Search Infrastructure: Integration of Typesense, a typo-tolerant and fast search engine, for instant indexing and querying of training programs and institutions.
  • Routers:
    • routers/simulate.py: Simulation API (model Profile, call compute_admission).
    • routers/formations.py: Training search & stats (BM25, geo filters, distributions).
    • routers/profiles.py: Profile CRUD (Profile, Wish).
    • routers/motive.py: Letter generation and sending (OpenAI API + Brevo).
    • routers/lycees.py: High school search (department/type weights).
    • routers/share.py: Sending results by email (HTML rendering, Brevo).
  • Database & ORM: db/database.py (SQLAlchemy / SQLite), models/profile.py.

Figure 2: API Endpoints Documentation

Figure 2: API Endpoints Documentation

6. Backend Attention Points

  • Mailing: /share/simulation → HTML rendering via _render_motive_body_html, sending via Brevo (keys in .env.local: BREVO_API_KEY, BREVO_EMAIL_SENDER).
  • Motivation Letter: /motive/generate (generation), /motive/email (sending).
  • Search Engine: Migration to Typesense enabling synonym management, typo tolerance, and relevant result sorting.
  • Academic Filtering: frontend/constants/acad_map.json (department → academy/territory).

7. Frontend Architecture

  • Stack: React (Vite). Entry main.jsx, app App.jsx.
  • Advanced Features:
    • Dynamic Theming: Intelligent system detecting origin URL to automatically adapt branding (logo, color palette, links) and enable multi-site white-label deployment.
    • Sharing and Public Profiles: Creation of public profile pages accessible via unique URL, allowing vendors to easily access students’ simulations and wish lists.
    • Training Comparator: Interactive tool allowing side-by-side comparison of multiple training programs on key criteria (selectivity, career prospects, expectations).

Figure 3: Accessible Profile Sheet

Figure 3: Accessible Profile Sheet

Figure 4: Training Program Comparator

Figure 4: Training Program Comparator

  • Key Folders:
    • api/api.js: endpoint calls (simulation, training programs, profiles, motivation, etc.).
    • components/: details/, motivation/, registration/, search/.
    • constants/: complete_form.json, concours.json, confidence_levels.json, factorExplains.js, thresholds.js.
    • context/: global state, theme (styles/theme.css, utils/ThemeLogo.jsx).
    • pages/: home, training programs, simulator, profile, letter.
    • utils/: pagination, conversions, helpers.

Figure 5: Search Page Graphical Interface

Figure 5: Search Page Graphical Interface

8. Deployment

a) Railway Hosting

Complete backend and frontend hosting with a connected PostgreSQL database and Typesense instance (Docker).

Figure 6: Railway Infrastructure

Figure 6: Railway Infrastructure

b) n8n Automation

  • Trigger: Scheduled (Schedule Trigger) for periodic execution.
  • Process: Daily synchronization of data (Profiles and Wishes) from the database to Google Sheets segmented by client.
  • Purpose: Automatic feeding of tracking data and triggering sub-workflows for dashboard updates.

Figure 7: n8n Automation Workflow

Figure 7: n8n Automation Workflow

These projects might interest you

Exploring a RAG Pipeline
Exploring a RAG Pipeline2026

Exploring a RAG Pipeline

RAG pipeline benchmark and assembly of the tool into a Chainlit chat.

Deep LearningNLPData Science
MCP Data Science
MCP Data Science2026

MCP Data Science

102 data science tools driven by natural language via Anthropic's MCP protocol.

Data ScienceNLP