
In the blog “Economic Power of Entity Propensity Models Are Transforming the Game”, I talked about how my childhood fascination with the board game of Strat-O-Matic baseball fueled my interest in analytics and mastering the power of player-level Entity Propensity Models (EPMs). Since then, I have learned the critical role of EPMs as modern-day AI-driven assets that could transform the organization’s economic value curve. EPMs provide a data-driven framework that enhances strategy, efficiency, and performance by predicting individual entities’ behavioral and performance propensities.
For example, in sports, player EPMs enable coaches to anticipate how athletes might react in specific game situations. This allows for more precise decisions about player development, injury and fatigue/workload management, lineup and roster decisions, player trade strategies, and in-game decisions.

Figure 1: Baseball Batter Entity Propensity Model (EPM)
But wait, there’s more to my childhood journey!
I also regularly listened to Minnesota Twins radio announcer Herb Carneal during those same years. His play-by-play narration brought the games to life as I huddled with my friend Mike O’Neill to listen to them. He shared insights into each player’s history, batting and pitching tendencies, and current game situations, providing valuable explanations to enhance our understanding of the game.
What if we could integrate the extensive knowledge and experience of play-by-play announcers about the players and coaches to reveal hidden performance patterns, situational tendencies, and psychological factors that could be used to make the player EPMs more relevant and accurate? How much more precise and relevant would it make the EPMs used to drive player development and in-game coaching situations?
The Evolution of Sports Analytics: From Box Scores to Subject Experts
Traditional baseball statistics found in box scores, such as batting averages, slugging percentage, and ERA, provide valuable quantitative insights into player performance. However, they often overlook the situational and contextual factors influencing player performance and game outcomes. These box score statistics present a one-dimensional view of a player’s performance propensities but fail to account for in-game variables such as pitch sequencing, defensive shifts, weather conditions, and high-pressure moments. Teams relying solely on traditional stats will miss critical insights into why a player excels in specific scenarios but struggles in others.
The play-by-play announcer transcripts offer a crucial piece of the puzzle. Whether it’s Vin Scully (baseball), Chick Hearn (basketball), or Jack Brichhouse (football), these experts provide invaluable insights that can capture:
- Situational pressure – (“Big moment here, bases loaded, two outs… how does this batter usually perform?”)
- Mental and emotional state – (“You can see the frustration in his body language after that strikeout.”)
- Defensive shifts and in-game strategy changes – (“The outfield has moved deep, anticipating a power swing.”)
- Fatigue and physical condition indicators – (“He’s been throwing more balls high to the last few batters—could be getting tired.”)
These announcers have seen and analyzed thousands of games, building a wealth of sports and player-specific tribal knowledge readily available for inclusion in your AI-driven Entity Propensity Models. By extracting insights from narrated game descriptions and merging them with statistical data, teams can create more advanced entity propensity models that reflect statistical performance plus qualitative factors to optimize player and in-game decision-making.
How Game Transcripts Enhance Predictive Player Models
Using natural language processing (NLP) and Generative AI tools like Cockatoo.com and ChatGPT, teams can analyze thousands of historical game transcripts to uncover more profound insights into players’ performance in specific scenarios, including:
Situational Performance Analysis. Assess how players respond in high-pressure moments:
- Clutch Performance Index: Does a batter thrive or struggle with runners in scoring position?
- Late-Inning Pitcher Effectiveness: Does a pitcher maintain velocity and command in the 7th, 8th, and 9th innings?
- Comeback Resilience: How does a team perform when trailing in the 9th compared to leading?
For example, the announcer might say, “He hasn’t had a hit in his last 12 at-bats with runners on third.” Your EPM model could leverage those insights to lower the batter’s High-Leverage Performance Score (HLP), influencing lineup and pinch-hitting decisions.
Mental & Psychological Readiness. Assess how players handle adversity:
- Frustration-Induced Errors: Does a player’s defensive performance suffer after striking out?
- Umpire Influence Analysis: Do certain players perform worse after arguing with an umpire?
- Post-Injury Confidence Index: How does a pitcher perform in the first games back from injury?
For example, the announcer might say, “You can see the frustration in his body language after that bad call.” Your EPM model might reduce the hitter’s composure score, signaling the likelihood of chasing bad pitches.
Fatigue & Injury Risk Assessment. Detect fatigue indicators before they become injuries:
- Pitcher Fatigue Detection: Does a pitcher’s command worsen after 90 pitches?
- Base Running Wear & Tear: Are sprint speeds slowing in late innings?
- Workload Stress Score: Does performance drop after back-to-back games without rest?
For example, the announcer might say, “He’s been missing high with his fastball more often.” The players’ EPM model might increase their Fatigue Risk score, prompting earlier bullpen consideration.
Classroom Exercise: Mining Game Play-by-Play Transcripts
I used the process to download, convert into text, and analyze radio announcers’ play-by-play narratives. This is not intended to replace the performance batting and pitching statistics captured during the game. Instead, consider it an opportunity to leverage Subject-Matter Experts who provide small, critical nuances to improve the athlete’s performance, effectiveness, and behaviors during the game (Figure 2).
- Step 1: Download the play-by-play audio track. I downloaded an audio recording (mp3 format) of the August 8, 1936, St. Louis Cardinals at Chicago Cubs game (the Cardinals beat my Cubbies 6 to 1).
- Step 2: Convert the MP3 into a .txt file (I used Cockatoo).
- Step 3: Upload the .txt file into ChatGPT (or my Dean of Big Data ChatGPT4).
- Step 4: Upload relevant data (articles, blogs, statistics, etc.) to establish ChatGPT Context Continuity. Note: I train my students on the contextual continuity technique on the first day of my university class.
- Step 5: Upload Socratic Method documentation to guide ChatGPT’s analytical exploration narrative.
- Step 6: Prompt ChatGPT to assess the .txt file to uncover player and coaching predictive propensities and tendencies.
- Step 7: Update the player Entity Propensity Model (analytic scores) based on ChatGPT uncovered player insights.

Figure 2: Mining Subject Matter Expert (SME) Conversations
And the results…
Here are the actual prompts and results my “trained” ChatGPT yielded:
ChatGPT Prompt: I have uploaded a 1936 baseball game transcript between the Chicago Cubs and St. Louis Cardinals. Using the concept of player entity propensity models (EPM), see what player-specific insights you can uncover from this transcript that might impact the development of each player’s respective EPM. Please provide your assessment in table format.
ChatGPT Response: Based on the 1936 baseball game transcript, I have extracted key player-specific insights that could be used to develop their Entity Propensity Models (EPMs). Below is a table outlining these insights:
Player | Key Observations from Transcript | Propensity Score Updates |
Terry Moore (Cardinals CF) | Struck a hard-hit grounder to first base that was caught on a line. Later popped out to the catcher. | – Expected Contact Quality (xCQ) (tendency to hit line drives or grounders) |
Frankie Frisch (Cardinals 2B) | Worked a walk, showing strong plate discipline. Later hit a line drive caught by the CF. | – Hot/Cold Zone Effectiveness (H/CZ) (likely good at recognizing balls vs. strikes) |
Pepper Martin (Cardinals RF) | Grounded into a fielder’s choice, showed aggressive base running. Hit a deep fly ball to left-center. | – Momentum-Adjusted Performance (MAP) (strong performance under pressure) |
Joe Medwick (Cardinals LF) | Hit a long fly that was caught in the outfield due to wind. Later doubled off the wall. | – High-Leverage Performance (HLP) (tendency to hit well in key moments) |
Johnny Mize (Cardinals 1B) | Hit a double to left field despite the defensive shift. Later drove in a run with a well-placed single. | – Platoon Advantage Score (PAS) (how well he performs against different pitching matchups) |
Leo Durocher (Cardinals SS) | Hit a strong single but was held up at third due to a quick throw from the outfielder. | – Base-State Performance (BSP) (effectiveness at advancing runners in scoring position) |
Billy Herman (Cubs 2B) | Made a leaping one-handed catch that saved a run. Later had multiple singles to left. | – Defensive Efficiency Rating (DER) (ability to make key defensive plays) |
Gabby Hartnett (Cubs C) | Hit a deep fly ball that was held by the wind. Showed visible frustration with the umpire’s strike zone. | – Expected Outcome Score (xOUTS) (predicting whether a plate appearance results in an out) |
Augie Galan (Cubs CF) | Successfully executed a sacrifice bunt to move a runner into scoring position. | – Small Ball Efficiency (SBE) (success rate for bunts, hit-and-run plays, and situational hitting) |
This is a unique way to quantify the insights of game-by-game announcers who have witnessed thousands of games. As a sports team, imagine the range and depth of insights gained from analyzing the play-by-play commentary of your games, not to mention the play-by-play commentary from the opposing team’s announcers.
NOTE: I have trained my ChatGPT using critical analytical concepts from my “Thinking Like a Data Scientist” methodology. I upload conversational techniques like the Socratic Method to prepare my GPT to think critically about key aspects of players’ performance and behavior. I direct GPT with a clear understanding of the desired outcomes and KPIs/metrics that measure effectiveness. There’s no magic here; it’s just hard work.
Mining SME Narratives Beyond Baseball
Other industries can also leverage SME conversations to build more context-aware entity models that can transform their economic value curve, such as:
- Healthcare: Doctor-Patient Conversations → Patient, Provider, and Treatment EPMs.
GenAI analyzes physician notes, patient interviews, and diagnostic discussions to refine models that predict patient treatment response, provider effectiveness, and optimal care pathways. This ensures personalized and efficient healthcare delivery. - Education: Teacher-Student Interactions → Student, Educator, and Curriculum EPMs
GenAI can analyze classroom discussions, student feedback, and teacher notes to refine models that predict student learning styles, identify at-risk students, and personalize lesson plans to optimize educational outcomes. - Cybersecurity: Security Analyst Reports → Threat, Vulnerability, and Response EPMs.
GenAI interprets incident reports, system logs, and threat intelligence briefings to refine models predicting cyberattack likelihood, system vulnerabilities, and optimal threat response strategies for proactive defense. - Manufacturing: Engineer and Technician Problem-Solving → Equipment, Process, and Maintenance EPMs.
AI analyzes technician diagnostic discussions, repair logs, and equipment performance data to predict machine failures, optimize maintenance schedules, and improve manufacturing efficiency. - Banking: Financial Advisor Consultations → Customer, Risk, and Investment EPMs
AI-driven insights from client-advisor meetings can refine customer financial profiles, predict investment propensities, and optimize risk-based lending decisions, ensuring tailored financial products and risk mitigation. - Energy & Utilities: Field Technician Logs → Equipment, Grid, and Maintenance EPMs.
GenAI analyzes technician reports, sensor data, and outage logs to optimize maintenance schedules, predict equipment failure, and improve energy distribution efficiency across grids and infrastructure. - Logistics & Supply Chain: Dispatcher Communications → Route, Demand, and Delivery EPMs.
GenAI processes real-time driver updates, weather reports, and traffic conditions to refine models predicting optimal delivery routes, demand fluctuations, and supply chain disruptions.
The only thing that limits the integration of conversation-driven insights to enhance the effectiveness of EPMs is your imagination and curiosity to explore!
Summary: Exploiting SME Insights
The future of analytics isn’t just about numbers—it’s about understanding the reasons behind them. Organizations that use AI to tap into expert conversational data will gain a significant competitive edge. This will allow for more precise decision-making and asset optimization, which can transform the organization’s economic value curve.
So, the next time you hear a game announcer making an insightful observation, just imagine—somewhere, a leading-edge sports organization might be using an AI model to listen, learn, and prepare to make the next great prediction.