Keywords

1 Introduction

In 2022, the World Health Organization (WHO) reported that as many as 12 billion working days are lost each year annually around the world due to mental health issues of depression and anxiety, at a cost of up to $1 trillion [1]. They recognized that “decent work” is good for mental health but also that poor working environments with excessive workloads and long and unsocial hours could pose a risk to mental health [1]. Decent work is also listed as goal number 8 of the UN’s sustainability goals [2]. However, working unsustainably long hours also poses a risk to physical health. In 2021, the WHO said that “Working 55 hours or more per week is a serious health hazard” [3]. In 2016, some 745,000 people died of heart attacks and strokes because of working long hours, a rise of 29% from 2000 [3]. While an Agile principle states that “Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely” [4], Healy et al. found evidence that challenges the assumed stability the scrum and kanban frameworks they call “Agile systems” [5]. They demonstrated that in 74% of 1,203 Jira projects, the arrival rates of work into systems were higher than the rates at which teams could complete their work. In this paper, we extend their work by investigating the relationship between stability and the prevalence of Agile team members creating and resolving Product Backlog Items (PBIs) outside of normal working hours. Using this “Unsustainable Hours” measure as an indicator of unsustainable ways of working, we further explore the relationship between Agile system stability and sustainability.

The rest of the paper is organized as follows: In Sect. 2, we discuss and briefly present the background and related concepts. We describe the research approach in Sect. 3 and the results in Sect. 4. We provide further discussion about the results and their implications in Sect. 5. Finally, we describe the limitations of our study in Sect. 6 and offer conclusions in Sect. 7.

2 Background

2.1 Measuring Stable and Sustainable Agile Work

To date, much of the Agile software development literature has focussed on how Agile systems, such as Scrum and Kanban, can ameliorate unsustainable ways of working. For example, Beecham et al. found that Agile should be able to improve the number of hours worked to sustainable levels at a finance company [6] and Rusconi found that Agile complemented sustainable ways of working at ING [7]. However, the literature does not indicate a consensus opinion. Hoda et al. note that non-stop iterations can serve to apply pressure on developers [8] and van Oorschot et al. [9] used a computational model to demonstrate that shorter iterations have on quality and rework.

Healy et al. previously introduced the Stability Metric, ψ, as a dimensionless measure of an Agile team’s ability to get their work done [5]. Using a large dataset of actual projects, they demonstrated that assumptions of system stability from a queueing perspective were invalid in 74% of projects analysed. Unstable systems also tended to have backlogs 10 times larger than stable ones. They also showed that the systems analysed tended to cluster towards marginal stability, where ψ = 1 with the single largest cluster of projects in the range of 0.9 to 1.1. However, they did not identify if unstable systems with their large, growing backlogs of PBIs, influenced work practices on Agile teams. We repeated their technique with a focus on how stability relates to a tendency toward excess work in one company, intive.

intive is a global technology professional services company headquartered in Germany [10]. Since 1999 its software development teams have delivered Agile projects for clients initially in Europe and latterly in the Americas. The type of Agile system used varies from project to project with some projects using Scrum or a variant, others using Kanban, and many projects using one of the scaled Agile frameworks. Most projects are focussed on the on-time delivery of new features and, as such, use typical Agile metrics such as velocity, burn-up/burn-down rates, and committed vs delivered. Some teams elect to use the in-house instance of Jira to manage their work either initially or for the duration of the project. In 2023, we received permission from intive to receive limited process data from their inactive, closed Jira projects. Working with their Chief Security Officer, Chief Technical Officer, Legal and IT teams we received process data from 295 Agile Jira Projects (JPs). These projects contained 181,014 PBIs created between 2008 to 2023. As we will show, this data could be used to assess both stability, and sustainability measured by the amount of excess work.

intive offers flexible working conditions to its employees based on a typical 40-h, Monday-to-Friday working week [11]. Staff are expected to typically observe core working hours between 10 am to 4 pm but there is significant flexibility. In Europe, where most of the closed projects were originally performed, intive has offices crossing three time zones. intive employees are not usually expected to frequently work between 6 pm to 6 am UTC from Monday to Friday or at any time over a weekend as this could lead to an unsustainable work-life balance.

We define “Unsustainable Hours” as work being created or resolved during the period of 6pm to 6am UTC from Monday to Friday or at any time over a weekend. We assume that work performed during this time is in addition to, rather than instead of, normal working hours. We also assume that individuals who worked these hours needed to, rather than chose to, so that the hours in the medium- to long-term would become unsustainable rather than being inconvenient. These assumptions will be discussed further later but it is important to note that without contextual data we cannot state that any project or individual at intive experiences overwork.

2.2 Assessing the Waste of Inventory/Partially Started Work

Poppendieck and Poppendieck [12] extended the work of Shingo to identify 7 wastes of software development. Where Shingo identified (excess) inventory as a waste, Poppendieck and Poppendieck mapped this to “partially done work” [12]. They identify any work that has not been delivered to production as being potentially wasteful, as it contains uncertainty and risks, including the risk that the design may not solve the problems it was intended to address. Another waste identified by both Shingo and Poppendieck and Poppendieck is “waiting” – the waste of having people or equipment capable of performing work but not having any work to do for a prolonged period. Therefore, we can observe that having both too much and too little work is wasteful. Inventory management is a discipline of operations management that seeks to find the optimal inventory. Krajewski et al. [13] describe several models of inventory management and introduce the concept of “inventory position” as a measurement of current inventory levels to satisfy future demand. Healy et al. describe how they measured both the current inventory level, L, and the historic long-term demand, measured by the service rate - the rate at which PBIs are marked as Done, µ [5]. We used the ratio of these two to calculate the total number of days of inventory on hand should no new PBI arrive. We term this metric “Inventory Days'’ as it relates to the Inventory Days on Hand/Days metric used in inventory management. However, unlike that metric, it uses the count of PBIs rather than their accounting value and costs. Schulfer suggests that although the optimal Inventory Days varies from business to business, common levels are between 30 and 60 days [14].

Using the intive Closed Jira Project Dataset (CJPD) we could map how Agile system stability and inventory varied with the tendency of projects to create or resolve PBIs during Unsustainable Hours. The next section explains how we approached this.

3 Research Approach

The primary research question we are addressing in this paper is: RQ: “Is there a relationship between the stability of Agile systems and unsustainable hours worked?” To answer this question, we used the metrics described above and analysed 295 Closed Jira Projects.

3.1 Analysing the Closed Project Jira Dataset

Intive IT identified every Jira Project that had been marked as Closed with more than 30 Product Backlog Items (PBIs). They extracted a comma separated values file with the following fields: Issue key, Issue id, Project key, Project type, Status, Issue Type, Created, and Resolved. The lead author, working with an active directory account, reviewed each file to ensure only required fields were included and to pseudonymize each filename using letter codes. The cleaned down files were placed into a single directory, ready for scripted analysis using the steps below.

Fig. 1.
figure 1

Automated processing steps for each Jira Project file in the Closed Jira Project Dataset

Figure 1 shows the analysis steps for each file. The steps were performed by a Python script titled Reporter.py. These steps are like the process steps described previously [5]. After importing the data, the script first counted the numbers of each issue type and resolution. Epic issue types and Subtask issue types were filtered from further analysis to ensure the work was approximately sized to be a piece of work requiring more than one person that should take less than a few days to complete and hence reduce a source of potential skewing of the Stability Metric between Jira projects. There were 78 issue types in total. Table 1 shows the top 10 issue types, accounting for 88.1% of the total number of PBIs.

Table 1. Top 10 issue types in our dataset. Filtered items were removed from subsequent analysis.

When a team completes a PBI, it receives a resolution. PBIs with resolutions like “Won’t Do”, “Rejected”, “Not a bug” and so on were removed to leave only PBIs that were likely to have been completed by a team. Table 2 shows the top 5 resolutions which accounted for 99.0% of all resolutions.

Table 2. Top 6 successful resolution types. Filtered items were removed from subsequent analysis.

3.2 Calculating the Stability Metric (Ψ)

We made some improvements to the calculation of the Stability Metric compared to the method described by Healy et al. [5]. They used a simplified linear model to calculate both the arrival rate, λ, and the service rate, µ [5]. For example, for λ, they took the total cumulative number of PBIs that had been created and divided this value by the total time between the last and the first filtered PBI created. We improved on this simple model by using linear regression of the total dataset to calculate the slope of the line of best fit of all the data points for measuring both the arrival rate, λ, and the service rate, µ. This allows us to test how well our data fits the assumed linearity through the R2 value. Figure 2 shows the cumulative arrival rates, service rates, calculated system (mostly backlog) size as well as best fit lines for the LO Jira Project, one of the projects in our dataset.

Fig. 2.
figure 2

Timeline of arrivals, resolution, backlog as well as best fit lines for arrivals and services

The Stability Metric, ψ, is the ratio between the service rate and the arrival rate, as previously described [5] and shown in Eq. 1. Each Jira Project was grouped into Unstable (ψ < 1), Stable (ψ > 1), and Marginally Stable (ψ = 1) from queueing theory.

$$\psi =\frac{\mu }{\lambda }$$
(1)

3.3 Calculating the Inventory Days (ID)

The inventory days, ID, for each Jira Project was calculated using Eq. 2. It is the ratio between the final product backlog size, L measured in PBIs, divided by the average service rate measured in PBIs per day. Equation 3 shows how we calculated the product backlog size, \(L\), by taking the total PBIs that had arrived, \(A\), and subtracting the total PBIs that had been resolved, Z.

$$ID=\frac{L}{\mu }$$
(2)

where,

$$L=A-Z$$
(3)

Once the Stability Metric, ψ, and Inventory Days, ID, were calculated for a system, they can be plotted in relation to one another in a 2x2 matrix. The matrix was divided horizontally by the marginal stability line, where ψ = 1. Below that the system is unstable and above it is stable. The matrix was divided vertically at an ID value of 30 days. This corresponds to approximately 1 months’ worth of inventory or a little more than 2 two-week sprints in a Scrum framework. Below this value, a team probably needs to start planning new work, and above it, there is enough and possibly too much work.

3.4 Calculating the Unsustainable Hours Percentages

Using a separate script, each project was reviewed to count the hours of the day and day of the week that each PBI was created and resolved. Figure 3 shows the time of the day PBIs were created and resolved for the LO Jira Project. This shows that most work was performed between 8am and 5pm. However, a small spike at 9pm demonstrates that there was work being marked as completed at that time also.

Fig. 3.
figure 3

Hour of all PBI arrivals and resolutions in the LO system, clock hours are on the circumferential axis with PBI count on the radial axis.

Figure 4 shows the day of the week that each PBI was created (in red) or was resolved (in green) for Jira project LO. This showed that this system never had any work arrive or be completed on a weekend day.

Fig. 4.
figure 4

Day of all PBI arrivals and resolutions in the LO system.

The script also calculated the percentage of “Unsustainable Hours” as the percentage of hours worked outside 6am and 6pm Monday to Friday and at any time over the weekend using Eq. 4 for PBI creation and Eq. 5 for PBI resolution.

$${Unsustainable Hours}_{created}= \frac{\sum PBIs created outside of 6am to 6pm, Mon to Fri }{\sum PBIs}\%$$
(4)
$${Unsustainable Hours}_{resolved}= \frac{\sum PBIs resolved outside of 6am to 6pm, Mon to Fri }{\sum PBIs}\%$$
(5)

These percentages were then compared to the Stability Metric and Inventory Days as shown in the next section.

4 Results

In this section, we discuss the results of the analysis outlined above. First, we present the stability metric and inventory day distributions of all 295 relevant Jira Projects in the Closed Jira Project Dataset. Then we present overall distributions of days and hours worked across all projects which shows work performed out of hours. Finally, we show the relationships between work performed out of hours and stability metric and inventory days.

4.1 Stability Metric and Inventory Days

Figure 5 shows the distribution of the Stability Metric for all the systems. 74.2% of all systems were unstable. However, as the figure shows, the systems tended to cluster around marginal stability with 29.5% of all systems having a stability between 0.9 and 1.1. Of the unstable systems, 2.4% had an arrival rate at least ten times higher than the service rate. 0.6% of systems had a stability of 2 or higher, meaning people on these projects had nothing to do.

Fig. 5.
figure 5

Stability Metric distribution.

Figure 6 shows the distribution of the Inventory Days. This shows that even though these projects were closed, there was still lots of work still unresolved. 55.9% of all systems had less than 30 days of inventory remaining when they were closed. However, 14.2% of projects had 181 days or more worth of PBI inventory outstanding when they closed, with one system having 9.3 years of work still to be completed.

Fig. 6.
figure 6

Inventory Days distribution.

By combining the two datasets into a 2x2 matrix we can see how the systems are distributed, as per Fig. 7. Because of the wide distribution of the Inventory Days a log scale was used and any systems with Inventory Days of zero were mapped to a value of 0.1 for visibility. Most projects analysed, 37.3%, were in the bottom-left quadrant. This usually appeared to be the conscientious closing of all open PBIs when the project was closed or transferred to a new project. This conscientious closing was completed by some 19.8% of teams and is a limitation of this dataset. The next highest group was in the bottom right quadrant with 37% of projects.

The results show that these Agile systems need to significantly accelerate service rates to bring backlogs under control. 18.6% of systems were delivering well and were in a position where they had fewer than 30 days’ worth of work outstanding at the point they closed. Just 7% of projects had a substantial backlog of work but were actively reducing it at the point of closure. We can use this data to analyse the relationship between these variables and hours worked.

Fig. 7.
figure 7

2x2 matrix of closed Agile Jira projects based on stability metric and inventory days

4.2 PBI Creation and Resolution Distributions

Figure 8 shows the cumulative days worked across all 295 Agile systems analysed and Fig. 9 shows the cumulative hours worked. Across these systems, someone somewhere created or resolved a PBI every hour of every day and at some time on every day of the week. 93.6% of all systems analysed had some work performed outside the hours of 6am to 6pm Monday to Friday. The figures show that the most likely time for a PBI to be created was on a Monday at 21.4% and between 11 am and midday, at 12.7%. PBIs were most likely to be resolved on a Tuesday, at 26.5%, and between the hours of 11 am and midday, at 11.4%.

Fig. 8.
figure 8

Cumulative PBI arrival and resolution day percentages.

Fig. 9.
figure 9

Cumulative PBI arrival and resolution hours percentages.

Overall, most work done to create or resolve a PBI is done in relatively reasonable working hours. Given individual working preferences and time flexibility we chose 6am to 6pm UTC Monday to Friday as a reasonable period for a team based in Europe to open or close a PBI in a long-term sustainable, but flexible, way of working. The data shows that 92.8% of PBIs were created in this period and 91.8% were resolved in this period.

Figure 10 shows the distributions of PBIs created and resolved during potentially Unsustainable Hours. This shows that although the average Jira project had 9.3% of its PBIs created overnight or during the weekend and 11.4% of its PBIs resolved during this period, these averages are skewed by high concentrations at either end of the distributions. 19 Jira Projects, 6.4% of the total, had no Unsustainable Hours, and 18 Jira projects, 6.1% of the total, had more than one-fifth of their PBIs created and resolved during Unsustainable Hours. For this latter group, it is possible that the team was working for a client based outside a European time zone, or that the client required out-of-hours deployments. It is also possible that the team developed a meeting culture, forcing some work late at night, or that the sheer volume of work left the team feeling the need to work late. Of course, it is also possible that some individuals simply prefer to work unusual hours. Since we cannot discount any factor, we have included all the data to allow us to analyse the relationship between the Stability Metric, Inventory Days, and these Unsustainable Hours percentages.

Fig. 10.
figure 10

Unsustainable Hours distributions for PBI creation and resolution.

4.3 Stability Metric, Inventory Days, and Unsustainable Hours

Figure 11 shows that there is essentially no correlation between the Stability Metric and Unsustainable Hours worked. The Spearman rank coefficient for created PBIs is 0.02 and for resolved PBIs is 0.04. This means that the flow of work in the system does not have any impact on the tendency for team members to work Unsustainable Hours on these Agile systems.

Fig. 11.
figure 11

Stability Metric versus Unsustainable Hours

Figure 12 shows that there is also essentially no correlation between the Stability Metric and unsustainable hours worked. The Spearman rank coefficient value for created PBIs is -0.03 and for resolved PBIs is -0.05. This means that the volume of outstanding work does not have any impact on the tendency for team members to work Unsustainable Hours on these Agile systems.

Fig. 12.
figure 12

Non-zero Inventory Days versus Unsustainable Hours

5 Discussion

The sustainability of work systems is of increasing importance with high economic costs to companies and health and well-being costs to workers. We analysed the tendency of workers to work late at night and over weekends, outside core working hours and then we compared this to systemic factors, such as the flow of work and the potential pressure of having a large amount of work to complete. Using data from real-world Agile teams in one company, we demonstrated these factors do not appear to be correlated.

This is an interesting and somewhat surprising finding as it suggests that whatever motivated the individuals in these projects to create or resolve a Jira PBI late in the night or on a Saturday, it was not the volume of work or the speed at which work needed to progress. There was as much tendency to work Unsustainable Hours when the team seemed to be running out of work as when there were piles of work to be done. While it might be understandable that a team member resolving a story or bug may not be aware of the size of the backlog or the number of new PBIs being created every day, it is less understandable how a Product Owner would not be somewhat aware of these things. Future work to introduce the Stability Metric and Inventory Days to real teams may be able to show if awareness of system metrics has an impact on behaviours.

In Sect. 2.1 we described two assumptions required to define Unsustainable Hours for these projects. While our data shows that working overnight or at the weekend did not appear to have an impact on system size or stability, we cannot say what impact this behaviour had on the individual, the team, or the quality of the work completed during unsustainable hours. Some people may simply prefer to work late at night. Future research is required to understand people’s motivations to choose when to work and, at times, overwork unnecessarily.

Throughout this study, we treated Jira PBIs as being equal. The teams were likely to have assigned different levels of time-criticality to individual pieces of work. Although it is not presented above, we have measures of lead times for different issue types as these can be calculated from the difference between the time a PBI was resolved and the time it was created. We have ongoing work investigating patterns of lead times for different issue types that may help diagnose time-criticality as an extrinsic motivation for unsustainable hours. However, as Table 1 shows, the primary issue type in this dataset was the Bug with 49,525 PBIs. Since some bugs are likely to be more critical than others, and it is not impossible that a story or other PBI may become critical, other datasets or research will be required to investigate this factor of working late nights. The next section presents some of the limitations of this study in detail.

6 Limitations

Compared to previous larger datasets examined, the Closed Jira Projects Dataset had the advantage of better contextual realism and a higher confidence that these Jira Projects were Agile in nature. However, to protect confidentiality only historic closed projects were shared, and only process data. Some of the projects had teams migrate to other systems, usually client systems, and in other cases, the project came to an end for commercial reasons. This limitation means that the results may be somewhat skewed toward having lower inventory days than active projects. For example, 19.8% of all closed projects had all of their then outstanding PBIs marked as closed on their final day. Future work to measure stability and inventory days on active Agile teams would be useful.

A second potential benefit of repeating a similar study in active Agile teams would be to measure the motivations for working at unexpected times. Because the data came from many projects over a 15-year period, it is likely that most common scenarios were at some time encountered but we cannot assert that for certain. Other studies of either many Agile teams or more detailed longitudinal studies of individual teams may be able to discount extrinsic systematic factors for late nights or weekends and separate those from intrinsic motivations where a person is inspired to sacrifice rest for a solution.

A limitation of this study was our use of the period between 6pm to 6am Monday to Friday to denote “Unsustainable Hours”. Figure 9 shows that over 90% of work performed was outside of these hours. However, it is plausible that individuals may prefer to work inside of these times. Also, anyone who worked a weekday between 6am and 6pm would regularly be working a 60-h week which would not appear to be sustainable. Future work with active Agile teams should ask individuals when they would prefer to work before measuring when they do work as an improved method of measuring true excess work.

This study used Jira records as a proxy for work performed. While creating and resolving PBIs could not be classed as a leisure activity we cannot be sure to the degree that one team member stayed up until 3 am to correct a tricky bug compared to another person deciding to do some minor Jira administration having woken at 3 am to let the cat out. Both will appear in our records the same, but the first person may feel tired from work while the second is likely well-rested. This study is limited to the accuracy of the data coming from a large number of teams across a range of projects, but all working in a single company.

7 Conclusion

Overwork is a dangerous activity that has increased over the last twenty years that has negative impacts on workers’ physical and mental health. Agile is an approach to work that has become popular over the same period. There is a need to consider backlogs and other forms of partially completed work as a form of inventory and a potential source of waste. This study examined if the volume or the speed of flows of work related to the number of out-of-hours work being performed. We found no evidence to suggest that either the speed or volume of work was related to excess hours. This finding may be useful to other researchers seeking to examine the actual causes of excess work. There were Jira projects that demonstrated some excess work. The variation of the projects in time, type, and duration meant it was unlikely that the causes of what kept individuals working late at night and over weekends were common. This means that while some of the motivation to work late may be extrinsic, some appear to be intrinsic. This research presents an opportunity to repeat these procedures with active Agile teams to survey participants as to their preferred work behaviours as well as to investigate the advantages and disadvantages of having constant access to work systems.