A strong understanding of math is one skill commonly needed to be a successful data scientist. Some of the most successful data scientists have a vast array of mathematical skills they can use — along with programming expertise — to run a data science project.
Often when Cyberans discuss machine learning and data science with business leaders and those in the start-up community, we try to find real-world applications to provide as examples. Over the last few years finding examples has become easier. Data science has become more common with the increase in availability of increased computing power and cost effective cloud infrastructure like cloud resources.
For instance, in January of this year, Alex Tennant — one of our data scientists — asked himself the question, “when discussing data science and machine learning, what is a relatable example for Canadians and Albertans?” The answer: hockey!
With machine learning, we frequently hear how automation will replace humans in decision making. So, to put this to the test, Alex programmed an algorithm that would choose the best fantasy hockey picks. For those who aren’t familiar with fantasy hockey, it essentially all boils down to math. You select players, referred to as “picks”, with the purpose of putting together a team of NHL players that will give you the most “fantasy points” in each game during that season. Points are awarded based on real-life player wins, goals, assists, +/-, saves by a goalie, etc. This was an especially interesting challenge for Alex to take on, because he doesn’t know a lot about hockey, but is good at math.
The Draft and an Abrupt End
After much trial and error, Alex trained six “bots” using different selection parameters to test several theories on how to make efficient picks. He then recruited a couple people at Cybera to test these theories: one who knows quite a bit about hockey and is good at math; and myself, who watches hockey — but is nowhere near an expert —, and is okay at math (I guess that made me the control).
We did a mock fantasy draft where us humans and the bots all picked separate players. Each week, all participants would manage our team line ups, based on which player we thought would garner more points. The bots would take into consideration things like injuries, previous years’ points, and present performances. The humans would do the same (admittedly myself, very poorly), with a pinch of gut instinct and humanity playing a role in our choices.
Our test abruptly ended when COVID-19 struck, leading to the remainder of the NHL season being cancelled. Despite this, the test became a relatable tool for introducing people to the world of data science and machine learning. It was used in several introductory workshops that Cybera presented last winter across the province.
COVID Ended the Demo…What’s Next?
What did we learn after the abrupt end? One of the Cybera’s employees (who went by the name “Bron Tiiu” in the fantasy pool) fared very well against the bots. They beat five out of the six machines, and came awfully close to being victorious over the AI. This human participant had a fundamental understanding of the factors that might make a hockey player successful. They took into consideration if a player had momentum (hot streak) , or if the player’s team was facing challenges. They could also take the performance numbers into consideration, and look at how those come into play against other extenuating circumstances.
Frequently, when discussing machine learning and data science, the topic “subject matter expertise” arises. You can have the best data scientist in the world. However, if the data scientist doesn’t have the background knowledge about the subject they’re working on, they won’t understand the context around the data, which can sometimes lead to the project failing. So as “Bron Tiiu” showed us, having context to the numbers was vitally important. If Alex had some of that expertise, his bots might have performed better. This is a perfect example of why you should leverage your data in unison with a subject matter expert.
So, what’s next? Alex has used the data that was collected to refine the bots. With the NHL’s 24 team post-season having begun this month, we’re now testing two bots against a wider group of Cybera employees. This will be a very interesting test, because the way the NHL is conducting the postseason (running all games in just two locations) has never been done before, which makes using “context” to determine outcomes challenging. There are other factors that may drive outcomes, including fluke wins, players having not played for months, and previously hurt players having had months to recuperate. Either way, we will find out if a machine can fully overcome the knowledge and experience of a human.
If you want to follow along with the results of our internal hockey pool, visit Sportsnet’s Fantasy Hockey Pool Page and select the “Cybera” group. You can also check out the hockey draft AI though this Github page.
Feel free to visit Cybera’s Data Science for Albertans site to learn more about the program.