New Canadian data mining tool analyzes how public submissions affect regulatory decisions

When the CRTC sought submissions to its 2015 consultation on “Basic Telecommunications Services,” thousands of people offered their opinions on what the future of Canada’s internet should look like. For the first time ever, these submissions have been made searchable via a single, simple-to-use platform, and a surprisingly varied narrative of struggles and concerns has emerged.

The 2015 CRTC consultation set out to determine if broadband should be considered a basic need for Canadians (the consultation ultimately decided that it should be). Submissions ranged from telecommunication companies arguing about the impact on their bottom lines, to low-income Canadians describing having to choose between paying for internet or food.

Collating and analyzing the 65,000+ pages of material submitted to the 2015 consultation required building a new data mining tool. This work was carried out by the data science team at Cybera (Alberta’s not-for-profit technology accelerator), with funding from the Canadian Internet Registration Authority’s Community Investment Program. The goal of Cybera’s Policy Browser tool is to better clarify if and how government decisions are made based on the public’s input.

Building this tool was not an easy task: the CRTC submissions were delivered in a wide variety of formats (from PDFs and Word documents to spreadsheets) that are traditionally very difficult to pull formatted text and data from. Cybera’s team used machine learning techniques to extract the text from these submissions, and group related phrases.

Using the Policy Browser tool, the team uncovered a broad spectrum of internet access issues and priorities across Canada. For example, when discussing the concept of “affordability,” larger telecom companies and industry groups tended to use words like “market demand” and “economic benefits,” while the nearly 3,000 individual Canadians who submitted feedback tended to use more personal terms, like “jobs”, “home” and “food”. A search of negative words used by individuals frequently came back with “pay”, “greed”, “ridiculous”, “poor” and “worry,” whereas network operators tended to use more neutral language.

Bigram Plot Graph
Bigram plot of frequently used words (and how they were associated with other words) in submissions to the 2015 CRTC “Basic Telecommunications” consultation. (click image for larger version)

Breaking the submissions down into groups (advocacy agencies, government, telecom companies, individuals, etc.) revealed interesting patterns in what issues were emphasized. Advocacy groups focused on affordability and telecom operating costs, while government groups discussed minimum speeds and how government funding should be allocated. The telecom companies focused their responses on the definition of “basic service,” and described how their revenue systems work and what plans they currently offer to address “affordability” needs.

“What is really interesting is how various parties are focusing on their own specific (and often unrelated) problems in these submissions, rather than presenting solutions to the same problems,” notes Barton Satchwill, Vice President of Technology at Cybera. “You get a good sense of the complex landscape the CRTC has to navigate when addressing different needs and demands.”

“Growing the awareness and understanding of the Canadian internet ecosystem is an important step in helping to improve it,” says David Fowler, Vice President of Marketing and Communications at CIRA. “We do this in several ways, including presenting data from Canada’s Internet Factbook, and we also act as a catalyst by supporting others. We’re proud to have funded Cybera’s project, which takes information that is often vast and dense, and turns it into a narrative that all Canadians can access and more easily understand. Growing this knowledge is a positive way for Canadians to engage in building a better online Canada.”

Adds Satchwill: “It’s still early days for the Policy Browser tool and the data analytics it can run. One of our biggest accomplishments so far was making the submissions more accessible! The CRTC made the consultation documents available on its website for individual download, which is a cumbersome process that would take one person a lifetime to go through. Our hope with this platform is that it can be adapted for other government consultations. This will make it easier for researchers to study the role that public submissions play on how regulations and policies are created.”

Cybera built the Policy Browser using open source tools, and is making the source code available to anyone who would like to apply it to other data mining applications involving large numbers of text files. For further information, visit the Policy Browser or contact datascience@cybera.ca.


Background

About Cybera

Cybera is a not-for-profit technology-neutral organization responsible for driving Alberta’s economic growth through the use of digital technology. Its core role is to oversee the development and operations of Alberta’s cyberinfrastructure — the advanced system of networks and computers that keeps government, educational institutions, not-for-profits, business incubators and entrepreneurs at the forefront of technological change.

Working with government, education, and private sectors, Cybera is creating a community that champions vital networking and computing services and utilities for everyone, everywhere. We also provide member organizations with unbiased, highly skilled expertise on technology products, processes or services, and access to shared IT tools.

About CIRA

CIRA is building a better online Canada through the Community Investment Program by funding innovative projects led by charities, not-for-profits and academic institutions that are making the internet better for all Canadians. CIRA is best known for our role managing the .CA domain on behalf of all Canadians. While this remains our primary mandate, as a member-based not-for-profit ourselves, we have a much broader goal to strengthen Canada’s internet. The Community Investment Program is one of our most valuable contributions toward this goal and funds projects in infrastructure and access, digital literacy, online services, and research. Every .CA domain name registered or renewed contributes to this program.

To date, CIRA has supported 102 projects with over $4.2 million in contributions.

Leave a Comment

Your email address will not be published. Required fields are marked *