The Challenges Ahead
The decline of print textbooks is forcing every publisher to move towards digital content. Once content is digitized and delivered, however, it opens vast possibilities to collect and analyze data. While data can be used for good, there are also manifold challenges that institutions have only begun to grapple with.
Digital products include not only textbooks, but also homework systems, assessment tools, adaptive content customized based on the learning profiles of students, standalone platforms, Learning Management Systems, lecture capture, etc. Anecdotal evidence suggests that these systems built and maintained by publishers capture massive amounts of data about student and faculty behavior that go beyond what is necessary for accomplishing their core objectives (i.e. improving student outcomes). Institutions, faculty and students should think about the accumulation and use of data collected and retained by schools and commercial vendors.
Student and Faculty Privacy
Digital tools collect and analyze data in a wide variety of ways, including to establish what is a student’s learning profile, where and when students access content or complete homework, what resources are used to complete tasks, how long it takes to complete individual exercises, which digital library materials have they accessed, and so on. While publishers may justify collecting this information for the purposes of improving educational outcomes, there are also serious questions about the potential risks. This data, if hacked, re-sold, or surrendered to governments without judicial review, it can be used to classify students, screen them for employment or access to graduate education, infer their political views, and even map their network of friends, mentors, and followers. While there are federal and state regulations concerning student privacy, some (such as FERPA) have not been updated in decades and cannot be assumed to cover all possible uses.
While many students today have a choice between acquiring a textbook in print or digital form, the trends towards restricting access to digital products is unmistakable: digital content lowers costs for publishers and enables the collection of data, and helps universities increase productivity and slow their cost inflation. For example, digital study guides supplementing digital textbooks allow colleges to reduce the number of teaching assistants required, particularly for large introductory classes. Are students better off if these gains in productivity are inadvertently purchased with vast amounts of their data?
The risks to student and faculty privacy are significant: they range from hacking to unmonitored re-sale of data to third parties. Could commercial vendors find themselves selling student data, even inadvertently, to the next Cambridge Analytica? Would commercial vendors resist government requests for data? Would universities resist requests for selective data from prospective employers, possibly dangling a greater number of hires from the institution if they could (for example) only know how students answered a specific set of questions or which students have desirable collaboration patterns? Would students read the fine print before sharing their data with an “app” offering to predict their dream job or lifetime earning potential?
Algorithms and Analytics
Also, the algorithms themselves used by publishers are notoriously not transparent, raising a spectrum of ethical questions. For example, how do adaptive learning algorithms conclude that an individual should be served one of several types of customized content? Are all student profiles considered and valued equally, or are systems effectively classifying students on the basis of perceived abilities and tendencies, handicapping some even before they complete a class? Numerous examples in recent news stories illustrate how algorithms can be influenced by the unconscious bias of the humans who design them, which can manifest in forms of unintended discrimination. In a higher education context where algorithms are trusted with increasingly important decisions, the lack of transparency raises not only ethical concerns, but also potential legal exposure.
It is worth noting that publishers are not the only vendors of data analytics to universities. Many (if not most) colleges are using data analytics to varying degrees in the recruitment process, and there are many questions in terms of fairness raised using non-transparent algorithms. Are algorithms perpetuating, even involuntarily, biases based on ethnicity, geography, occupation, and likelihood that students or their families turn into donors, etc.?
It is very important to emphasize once again that this report is not intended to take an adversarial view about the deployment of data analytics in academic institutions. We acknowledge that the issues posed by data are here to stay. We would strongly recommend that academic institutions analyze separately the issues posed by metrics (“what is being measured”) from those posed by algorithms (“how is this being measured”). Of course, the two categories feed on each other. For example, it can become easy to measure performance by using what is made available by vendors and can be procured easily, instead of devoting resources to evaluate hard to quantify (or just to collect) elements of performance. But metrics and algorithms pose very different issues and should be addressed separately.
Academic institutions need to take control of metrics. It is their own responsibility – and theirs alone – to ensure, for example, that faculty are evaluated on the basis of multiple factors. These factors may include the impact factor of journals that published their research, but may also extend and weigh appropriately, for example, collaboration, collegiality, management of junior staff and team work. Of course, these other elements may be complex or expensive to gather and analyze, can be ambiguous and leave room to criticism. We are not advocating that academic institutions choose any specific metric over another – just that they deliberately address which should be used in the evaluation of faculty, rather than just using those that are easily available through commercial vendors.
Algorithms, on the other hand, do not need to be developed by each academic institution, just so long as they are transparent and can be analyzed and properly understood. “Black box” algorithms used in academic settings may contain any number of issues that are incompatible with the values of the institution. But, as long as the algorithms remain hidden, there is no way to know, and – as many anecdotical reports indicate – biases can easily be built into them, even inadvertently.
We do see some academic institutions actively opposing the use of data analytics. For example, the University of Ghent in Belgium announced in December 2018 that it would change how it evaluates its faculty. In the announcement, Rector Rik Van de Walle wrote – among other things:
“No more procedures and processes with always the same templates, metrics and criteria which lump everyone together” and “The model must provide a response to the complaint of many young professors that quantitative parameters are predominant in the evaluation process. The well-known and overwhelming ‘publication pressure’ is the most prominent exponent of this. Ghent University is deliberately choosing to step out of the rat race between individuals, departments and universities. We no longer wish to participate in the ranking of people”.
This is a courageous and daring course, and we acknowledge that most North American academic institutions may not be ready to abandon data and data analytics in order to inform decisions outright. Our goal is to ensure that institutions approach these decisions deliberately and in a manner consistent with their values.