Abstract
The time is ripe for more case-by-case analyses of “big data”, “machine learning” and “algorithmic management”. A significant portion of current discussion on these topics occurs under the rubric of Automation (or, artificial intelligence) and in terms of broad political, social and economic factors said to be at work. We instead focus on identifying sociotechnical concerns arising out of software development in the topic areas. In so doing, we identify trade-offs and at least one longer-term system safety concern not typically included alongside notable political, social and economic considerations. This is the system safety concern of obsolescence. We end with a speculation on how skills in making these trade-offs might be noteworthy when system safety has been breached in emergencies.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
Keywords
3.1 Roadmap and Introduction
After this section’s background preliminaries, we briefly examine the consequences of treating Automation/AI as an overarching rubric under which to frame discussions of algorithmic management, machine learning and big data. We then move to the bulk of the chapter to identifying and discussing the three key topics and associated trade-offs within a sociotechnical context of hardware and software developers in highly distributed systems such as Google, Netflix, Facebook and Amazon’s technical infrastructure. We conclude with discussing how our case- and topic-specific perspective helps reframe discussions of algorithmic management, machine learning and big data, with a special emphasis on system safety management implications.
Algorithmic management, machine learning and big data are fairly well-defined concepts. In contrast, the popularised term “AI” is in respects more a hype-driven, marketing term than a meaningful concept when discussing real-world digital issues focused on in this chapter. We will not discuss AI further, nor for brevity’s sake are we going to discuss other key concepts the reader might expect to be included alongside algorithmic management, machine learning and big data.
In particular, this chapter does not discuss “expert systems” (expert-rule systems). This omission is important to note because expert systems can be thought of as the opposite in some respects of big data/machine learning. In the medical context, an expert system may be developed by having medical professionals define rules that can identify a tumor. A big data/machine learning approach to the same problem might start with the medical professionals marking numerous images of potential tumours as being either malignant or benign. The machine learning algorithm would then apply statistical techniques to create its own classification rules to identify which unseen tumours were malignant and benign. In case it needs saying, expert systems are also found in other automated fields, e.g., in different autonomous marine systems [10].
These differences matter because sociotechnical systems differ. In autonomous marine systems (among others), there are components (including processes and connections) that must never fail and events that must never happen (e.g., irreversible damage to the rig being repaired by the remotely operated vehicle) in order for the autonomous system to be reliable. Redundant components may not be readily available at rough seas. In highly distributed systems, by way of contrast, each component should be able to fail in order for the system to be reliable. Here redundancy or fallbacks are essential.
3.2 Limitations of “Automation” as a Covering Concept
To get to what needs to be said about algorithmic management, machine learning and big data, we must usher one elephant out of this chapter (albeit very much still found elsewhere in this volume), that of capital-A “Automation”. This very large topic is the subject of broad political, social and economic debates (for much more detailed discussions of the interrelated debates over “automation” writ large, see: Benanav [2, 3]; McCarraher [5, 6]:
-
Economic: It is said that Automation poses widespread joblessness for people now employed or seeking to be in the future.
-
Social: It is said that Automation poses huge, new challenges to society, not least of which is answering the existential question, “What is a human being and the good life?”
-
Political: It is said that Automation poses new challenges to the Right and Left political divide, e.g., some Right free-market visionaries are just as much in favor of capital-A Automation as some elements on the Left, e.g., “Fully Automated Luxury Communism” (for more on the possible political, social and economic benefits, see Prabhakar [7]
This chapter has nothing to add to or clarify for these controversies. We however do not see why these concerns must be an obstacle to thinking more clearly about the three topic areas.
The sociotechnical context, this chapter seeks to demonstrate, is just as important. Large-scale sociotechnical systems, not least of which are society’s critical infrastructures (water, energy, telecommunications….), are not technical or physical systems only; they must be managed and operated reliably beyond inevitably baked-in limitations of design and technology [8, 9]. The sociotechnical context becomes especially important when the real-time operational focus centres on the three subject areas of algorithmic management, machine learning and big data in what are very different large sociotechnical systems that are, nevertheless, typically conflated together as “highly automated”.
If we are correct—the wider economic, political and social contexts cannot on their own resolve key concerns of the sociotechnical context—then the time is ripe for addressing the subjects of concern from perspectives typically not seen in the political, social and economic discussions. The section that follows is offered in that spirit.
3.3 Developers’ Perspective on a New Software Application
We know software application developers make trade-offs across different evaluative dimensions. The virtue of the dimensions is that each category can be usefully defined from the developer’s perspective and that each fits into a recognisable trade-off faced by software developers in evaluating different options (henceforth, “developers” being a single engineer, team or company).
This section focuses on a set of interrelated system trade-offs commonly understood by software developers, including their definitions and some examples. Many factors will be familiar to readers, albeit perhaps not as organised below. No claim is made that the set is an exhaustive list. These well-understood dimensions are abstracted for illustrative purposes in Fig. 3.1.
-
1.
Comprehensibility/Features Dimension
-
Comprehensibility (Left Side): Ability of developer to understand the system, all bounded by human cognitive limits. Highly distributed systems are often beyond the ability of one team, let alone individual, to fully know and understand as a system.
-
Features (Right Side): Capabilities of the system. Additional features provide value to users but increase the system’s sociotechnical complexity,Footnote 1 thereby reducing comprehensibility.
-
-
2.
Human Operated/Automated Dimension
-
Human Operated (Left Side): Changes to the system configuration are carried out by human operators. For example, in capacity planning, servers may be manually ordered and provisioned to address forecasted demand.
-
Automated (Right Side): The system may dynamically change many aspects of its operation without human intervention. For instance, it may automatically provide or decommission servers without the intervention of human operators.
-
-
3.
Stability/Improvements Dimension
-
Stability (Left Side): System operates at full functionality without failure. Beyond strict technical availability, stability may also include the accessibility of the systems to operators trained on an earlier version of the system without requiring retraining.
-
Improvements (Right Side): Changes to the system are made to provide new and enhanced features, or other enhancements such as decreased latency (response time).
-
-
4.
Redundancy/Efficiency Dimension
-
Redundancy (Left Side): The possibility of the system to experience the failure of one or more system components (including processes and connections) and still have the capacity to support its load. An example is a system provisioned with a secondary database ready to take over in case the primary one fails.
-
Efficiency (Right Side): The ability of the system to provide service with a minimum of cost or resource usage. It is paying for what you are using only.
-
These four dimensions are relied upon by software builders, where the trade-offs can be explicitly codified as part of the software development and application process. Consider Google’s Site Reliability Engineer (SRE) “error budget”, where applications are given a budget of allowed downtime or errors within a quarter time period. If exceeded—the application is down for longer than budgeted—additional feature work on the product is halted until the application is brought back within budget.Footnote 2 This is an explicit example on the Stability/Improvements dimension.
For each of the four dimensions, current technology and organisation processes occupy one or more segments along the dimension. These respective segments expand/intensify as new technology and processes are developed.
By way of illustration, consider the Human Operated/Automated dimension. Technology and new services have provided additional opportunities to automate the management of increasingly complex sociotechnical systems:
-
In the 2000s, the advent of Cloud providers such as Amazon Web Services (AWS) and Google Cloud Platform (GCP, initially with App Engine) provided significant opportunities to provision hardware via application programming interfaces (API’s) or technical interfaces making it relatively simple to spin up/down hardware instances based on automated heuristics.
-
More recently, big data and machine learning have provided additional opportunities to manage systems using opaque ML algorithms. DeepMind has, for example, deployed a model that uses machine learning to manage the cooling of Google’s data centres leading to a 40% reduction in energy use.Footnote 3
-
Processes, such as Netflix’s Chaos Monkey, enable the organisation to validate the behavior of their highly complex systems under different failure modes. By way of example, network connectivity may be deliberately broken between two nodes to confirm the system adapts around the failure, enabling them to operate increasingly complex and heavily automated architectures.
The expansion of a dimension’s segments is dominated by an asymmetrical expansion of activities and investments on the right side. The importance of Cloud providers, big data and machine learning in driving the expansion has been mentioned. Other factors include sociotechnical shifts such as agile methodologies, the rise of open source, the development of new statistical and machine learning approaches, and the creation of more recent hardware such as GPU’s and smartphones.
3.4 What’s the Upshot for System Safety? Obsolescence as a Long-Term Sociotechnical Concern
System safety is typically taken to be on the left side of the developer’s trade-offs, located in and constituted by stability, redundancy, comprehensibility and recourse to human (manual) operations. Since the left side is also expanding (due in part to advances not reported here outside the three topic areas), we can assume the left-side expansion contributes to advances in safety as well.
If the left side is associated with “system safety”, then the right side can be taken as the maximum potential to generate “value” to the developer/company. Clearly, increases in both right-side features and right-side efficiencies can increase the ability of the system to provide value for its operators or users, other things considered.
Now, look at that left side more closely, this time from the perspective of the designer’s long term versus short term.
Software applications are littered with examples of stable and capable systems that were rendered obsolete by systems that better met users’ needs in newer, effective ways. If the current electrical grid rarely goes down, that is one form of safety. But do we want a system that is stable until it catastrophically fails or becomes no longer fit for new purposes? Or would we prefer systems to fail by frequent small defects that, while we can fix and solve in real time, nonetheless produce a steady stream of negative headlines?
More generally and from a software designer’s perspective, we must acknowledge that even the most reliable system becomes, at least in part and after a point, outdated for its users by virtue of not taking advantage of subsequent improvements, some of which may well have been tested and secured initially on the right side of the trade-offs.
In this way, obsolescence is very much a longer-term system safety issue and should be given as much attention, we believe, as the social, political and economic concerns mentioned at the outset. Cyber-security, for example, is clearly a very pressing right-side issue at the time of writing but would still be pressing over the longer term because even stable defences become obsolete (and for reasons different than current short-term ones).Footnote 4
3.5 A Concluding Speculation on When System Safety is Breached
Since critical infrastructures are increasingly digitised around the three areas, it is a fair question to ask: Can or do these software developer skills in making the four trade-offs assist in immediate response and longer-term recovery after the digital-dependent infrastructure has failed in disaster? This is unanswerable in the absence of specific cases and events and, even then, answering would require close observation over a long period. Even so, the question has major implications for theories of large-system safety.
The crux is the notion of trade-offs. According to high reliability theory, system safety during normal operations is non-fungible after a point, that is, it cannot be traded-off against other attributes like cost. Nuclear reactors must not blow up, urban water supplies must not be contaminated with cryptosporidium (or worse), electric grids must not island, jumbo jets must not drop from the sky, irreplaceable dams must not breach or overtop, and autonomous underwater vessels must not damage the very oil rigs they are repairing. That disasters can or do happen reinforces the dread and commitment of the public and system operators to this precluded-event standard.
What happens, though, when even these systems, let alone other digitised ones, fail outright as in, say, a massive earthquake or geomagnetic storm and blackout? Such emergencies are the furthest critical infrastructures get from high reliability management during their normal operations. In disasters, safety still matters but trade-offs surface all over the place, and skills in thinking on the fly, riding uncertainty and improvising are at their premium.
If so, we must speculate further. Do skills developed through making the specific software trade-offs add value to immediate response and recovery efforts of highly digitised infrastructures? Or from the direction: Is the capacity to achieve reliable normal operations in digital platforms—not by precluding or avoiding certain events but by adapting to electronic component failure most anywhere and most all of the time—a key skill set of software professionals and their wraparound support during emergency management for critical infrastructures? Answers are a pressing matter, as when an experienced emergency manager in the US Pacific Northwest itemised for one of us (Roe) just how many different software critical to the emergency management infrastructure depend on one platform provider major in the region (and globally for that matter).
Notes
- 1.
Complexity in terms of these digital systems is indexed in terms of the elements, their functions and the interrelationships between elements and functions in the systems (for this classic definition of sociotechnical complexity, see LaPorte [4]. These features are also captured by (Michael) Conway’s law, “organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations”.
- 2.
“Error budgets are the tool SRE uses to balance service reliability with the pace of innovation. Changes are a major source of instability, representing roughly 70% of our outages, and development work for features competes with development work for stability. The error budget forms a control mechanism for diverting attention to stability as needed.” (https://sre.google/workbook/error-budget-policy/).
- 3.
- 4.
Cyber-security, however, deserves its own treatment and pushes us beyond the remit of this chapter (see how societal safety and societal security overlap and differ in Almklov et al. [1].
References
P.G. Almklov, S. Antonsen, K.V. Størkersen, E. Roe, Safer societies. Safety Scie. 110(Part C), 1–6 (2018)
A. Benanav, Automation and the future of work--Part one. New Left Rev. 119(Sept/Oct), 5–38 (2019a)
A. Benanav, Automation and the future of work--Part two. New Left Rev. 120(Nov/Dec), 117–146 (2019b)
T.R. La Porte, Organized Social Complexity: Challenge to Politics and Policy (Princeton University Press, 1975)
E. McCarraher, Automated vistas (I). Raritan 39(1), 18–42 (2019)
E. McCarraher, Automated vistas (II). Raritan 39(2), 102–126 (2019)
A. Prabhakar, In the realm of the Barely feasible—complex challenges acting the nation require a new approach to ramping up innovation: solutions R&D. Issues in Science and Technology XXXVII 1(Fall), 34–40 (2020)
E. Roe, P.R. Schulman, High Reliability Management: Operating on the Edge (Stanford University Press, Stanford, CA, 2008)
E. Roe, P.R. Schulman, Reliability and Risk: The Challenge of Managing Interconnected Infrastructures (Stanford University Press, Stanford, CA, 2016)
I.B. Utne, I. Schjølberg, E. Roe, High reliability management and control operator risks in autonomous marine systems and operations. Ocean Eng. 171(1), 399–416 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Roe, E., Fortmann-Roe, S. (2023). Key Dimensions of Algorithmic Management, Machine Learning and Big Data in Differing Large Sociotechnical Systems, with Implications for Systemwide Safety Management. In: Le Coze, JC., Antonsen, S. (eds) Safety in the Digital Age. SpringerBriefs in Applied Sciences and Technology(). Springer, Cham. https://doi.org/10.1007/978-3-031-32633-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-32633-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32632-5
Online ISBN: 978-3-031-32633-2
eBook Packages: EngineeringEngineering (R0)