Last month’s Open Data Science Conference (ODSC) started with some hands-on problem solving for attendees to mull over: How long will it take 3,000 people to filter through one registration desk, given there are four possible streams they could register through, sorted alphabetically by last name? You might be tempted to reference last year’s registration pace and the rate those were processed — but last year there were only 1,000 registrants and the year before that 500, so how would those rates scale?
And just like that, attendees were immersed in the rapidly expanding field of data science and an energetic conference. Organizational imperfections aside, the ODSC in Boston delivered an engaging conference with a number of high-profile speakers from the data science community, including JJ Allaire, Stefan Karpinski, and Owen Zhang (the founders of R Studio, Julia, and the #1 ranked data scientist on Kaggle, respectively). Here are some of my main takeaways from the conference:
There’s a Lot of Buzz Around Data Science Careers
Interest in data science is growing rapidly, and there are many companies and people trying to take advantage of the buzz and interest in the field. The ODSC held a career fair with representation from companies as diverse as Harvard University, Allstate, and Facebook. In order to meet the demands of the job market and interest from the community, post-secondary institutions in Canada are beginning to offer dedicated Data Science programs. For example, Carleton University and the University of British Columbia both offer master’s degrees in data science. There are also several training programs available, either offering introductory experiences, or an internship/fellowship. Insight offers one of the best known programs, and organizations such as Merit and the NYC Data Science Academy provide training as well (both had a physical presence at the ODSC).
And there is clearly a lot of interest from prospective job seekers. Many attendees were at the ODSC to learn more about data science and what it takes to get into the field. Aspiring data scientists came from a broad spectrum of backgrounds ranging from math, chemistry, and even design. With regards to experience, employers still seem to prefer candidates with graduate degrees, which at times can be off-set by a vivid portfolio of data science work.
Hybrid Artificial Intelligence (AI)
One of the biggest epiphanies I had at the ODSC was the utility of hybrid AI. In data science, we often focus on automatically classifying datasets into discrete bins and getting black or white answers. For example, did passenger #123 survive or die in the Titanic disaster? However, there is a high probability that somewhere along the decision path of a data science product, a point of uncertainty will arise. At those points, a human in the loop can provide incredible value. Leveraging this human intervention can vastly improve a data product. In those cases, machine learning and AI do not replace humans, but they can be used to augment them and improve the decision making process. This represents a change in business logic, as it makes business cases viable that previously would not have been possible. This is being practiced in the real world through products such as Facebook’s assistant M and the travel assistant Pana. Entertainingly, a hybrid AI approach was also just used to power a teaching assistant bot for a class at Georgia Tech.
Other tools & trends:
-
There were many technologies and products on display at the ODSC, and a few of them made a particular impression. For one, everyone seems to be using notebooks for their analysis. Notebooks are web applications that provide an easy way to share code and embed live code in an application. Jupyter in particular was prominently featured, and tools and development with R Markdown were on display as well.
-
Stefan Karpinski made a strong case for why Julia should be the language for performing data science, combining the ease of prototyping using scripting languages such as python or R, with the ability to fine-tune code for performance, which is typically done using system languages such as C (Karpinski’s 10 minute presentation is available online).
-
As data science becomes more and more popular, software vendors are trying to open up the world of analytics to the masses. There are a number of tools that provide drag and drop interfaces, to minimize the amount of coding that needs to be done and the amount of machine learning knowledge that is required. Rapid Miner, Data Iku, and Azure ML all provide drag and drop interfaces, while DataRobot does a systematic parameter and learning algorithm sweep in an attempt to find the best performing model for users.
So — how long did it take to get everyone registered? We will never know as the queue soon dispersed and attendees registered asynchronously. Given the growth curve data science is on, and the breadth of its applications, it’s safe to say next year’s lineups won’t be shorter.