2.1 Introduction

As discussed in the previous chapter, SODALITE focuses on the configuration, deployment and operation of complex applications. Often these are developed by specialists of particular application domains and particular development technologies that, however, are not necessarily expert of the resources from which applications could benefit for their execution. This implies that for such teams it is not easy to take care of IT-intensive tasks such as handling the deployment of complex applications on multiple heterogeneous infrastructures, making this process repeatable with no errors, fine tuning the execution of applications in order to keep performance and costs under control.

There are many evidences of the complexity of such tasks that have lead to the introduction of the DevOps lifecycle, to reinforce the importance and the advantages of a good cooperation between Dev and Ops, and to the emergence of the Infrastructure as Code (IaC) paradigm, which implies the possibility to write software that defines the way applications should be deployed, configured and executed.

While the literature presents several approaches that support some DevOps and IaC activities in a cloud environment, the main novelty of SODALITE is essentially to create a complete framework tackling multiple DevOps aspects and targeting multiple types of resources.

2.2 SODALITE Main Features

The envisaged platform is supposed to serve different users, who are experts in different aspects, related to the definition and operation of software-defined computing infrastructures, and requires resources to carry out the different activities. Figure 2.1 presents the use cases served by the features provided by SODALITE. Before discussing the use cases, we must introduce the different actors with which they interact. These can be grouped in human actors and resources.

Fig. 2.1
A representation of SODALITE has three experts: application Ops, Resource, and quality. Each one is considered a human. The resource has three categories: middleware, execution platform, and application component. While middleware has two types: storage and communication. The execution platform has three types: G P U cluster, cloud, and H P C.

Actors and use cases

The envisaged human users are the following. To ease their harmonization in the context of a standard life-cycle, they can also be mapped onto the roles in charge of some of the processes defined in the ISO/IEC/IEEE standard 12207 Systems and software engineering—Software life cycle processes [1]:

  • Application Ops Expert (AOE) is in charge of operating the application and, as such, is in charge of all the aspects that refer to the deployment, execution, optimization and monitoring of the application. He/she is supposed to know the applications to execute and the requirements on both the deployment/execution environment and the quality of services he/she is interested in. This role can correspond to ISO/IEC/IEEE role in charge of Operation processes and maintenance processes as they focus on the day-by-day operation.

  • Resource Expert (RE) is in charge of dealing with the different resources required to deploy and execute the application. This role is in charge of application component technologies, of cloud, HPC, and GPU-based computing infrastructures, or of middleware solutions for both storing data and allowing components to communicate. This role can correspond to IEEE roles in charge of Infrastructure management and Configuration management processes, given they are supposed to allocate and manage resources and configurations.

  • Quality Expert (QE) is responsible for the quality of service both provided by the execution infrastructure and required by the executing application. Being part of the SODALITE ecosystem, he/she is in charge of offering libraries of patterns for addressing specific performance and quality problems in the SODALITE applications. This role can can correspond to IEEE roles in charge of Quality Management and Quality assurance processes because they oversee the overall quality of deployed applications and thus of the project itself.

The resources of interest for SODALITE cover the ones needed to operate applications on SODALITE. These can be specialized in:

  • Application components are the executables the applications of interest are partitioned in. These components can be based on diverse technologies and come both as black-boxes and as complete packages, that is, the executables come with source code and with any other external artifact needed to compile, deploy, and execute them.

  • Execution platforms provide the means to execute the different application components. They can be cloud based elements (e.g., virtual machines or containers), HPC infrastructures, or clusters of GPUs.

  • Middleware frameworks provide the underlying glue and help both store the different data and artifacts and make the different elements communicate. Among the others, middleware frameworks include communication elements such as VPNs (Virtual Private Networks) and any other element needed to configure the interaction between the other resources or the application components.

Identified use cases reflect the main activities human actors can trigger or participate in as part of the life cycle management of IaC. Lack of room does not allow us to provide the details of each single use case. Figure 2.1 provides a high-level view of the scope of the project while Table 2.1 lists all the use-cases covered by the project. The rationale behind the different use cases is the following:

Table 2.1 SODALITE use cases
  • To make the SODALITE framework usable by AOEs, it must be populated with information concerning the resources that can be exploited at runtime. This requires modelling resources (UC13) and making them available, as part of the SODALITE Domain Specific Language, to AOEs. This activity is performed by Resource Experts who are also in charge of mapping the modelled resources into specific optimization patterns (UC12). These experts can then also search for the resources they need by querying the system for already-defined resources (UC17).

  • The Quality Expert defines a bug taxonomy for IaC (UC11) that helps AOEs in predicting bugs (UC5). Moreover, he/she experiments with application components and prototypes to estimate their quality characteristics (UC14).

  • AOEs start their activity by defining an application deployment model (UC1). This model includes the main components of an application and any constraint or requirement on their deployment, configuration or execution. At this point they can either rely on the resources the SODALITE system would assign by default, or they could select specific resources (UC2). After this step, they are ready to trigger the automatic generation of IaC code (UC3) and its verification (UC4) as well as bug prediction and correction (UC5) and static optimization (UC15) aiming at improving application performance. Of course, these activities may lead to some reiteration in the mentioned use cases until the point in which, as part of the IaC code generation, AOEs generate the needed runtime images (UC16). Then AOEs can trigger the execution of provisioning, configuration and deployment (UC6), start the application (UC7) and start monitoring the execution (UC8) with the purpose of checking that everything is working well and, in case of problems, of identifying possible refactoring and deployment improvement options (UC9). As a result of this identification, they can go back to the modelling and IaC generation/verification/optimization phases and, at this point, trigger a partial redeployment of the system (UC10). After completing a deployment, they can also take care of governing it (UC18).

All use cases are mandatory steps for a proper usage of SODALITE, with the exception of the following ones that concern steps that can either be triggered by the actors or can be skipped. Select Resources since default resources can be assigned to an application if no one is selected Predict and Correct Bugs as the AOE may be willing to exclude this automated correction and may want to take care of bugs by himself/herself. Monitor runtime as, while monitoring is highly beneficial, it may introduce an overhead that users may want to exclude. Of course, excluding monitoring implies that UC9 and UC10 (refactoring and redeployment) cannot be performed. Identify Refactoring Options, Execute Partial Redeployment, and Statically Optimizer Application and Deployment represent the most advanced features offered by SODALITE, but the user can still use the platform without exploiting them.

2.2.1 Workflows

This section presents the main workflows supported by the SODALITE platform. They are focused on three major primary users of SODALITE—Application Ops Experts, Resource Experts, and Quality Experts—plus a secondary user, the system administrator in charge of deploying and configuring the SODALITE platform itself. In the following we present the workflows associated with these types of users and highlight the artifacts produced in these workflows and where they are located during a normal execution of the SODALITE platform.

Fig. 2.2
Representation resource expert has two categories, select an existing resource model and model a resource. Select an existing resource model from associate ansible playbooks to operations. Model a resource to create Ansible playbooks for operations and ansible model repository data is stored. All data are stored in SODALITE.

Workflow for the Resource Expert

Figure 2.2 presents the workflow typically followed by the Resource Expert. He/she is in charge of creating resource models and Ansible playbooks to support the execution of the corresponding operations. In the case a model of the resource under consideration is already available, for instance, because the Platform Discovery has automatically defined the resource, the Resource Expert will limit his/her work to the selection of a specific resource and to the creation or the selection, in case they are already available, of the Ansible Playbooks that implement the operations to be executed for that resource if needed.

The Resource Expert performs his/her activities by exploiting two SODALITE tools, the IDE for all modeling/editing activities and, indirectly, the Platform Discovery.

The Knowledge Base is the main data store used in this workflow. It includes the resource models and it is updated with the URL of the Ansible scripts associated to such resource models. The Ansible Modules Repository is an off-the-shelf directory offered by the Ansible community and including all available modules. The Ansible playbooks used or produced within the context of SODALITE can be made available on any datastore, including a git repository, that supports their identification through a proper URI.

Application Ops Experts are involved in two types of activities within the context of SODALITE, those concerning the design of AADMs and those concerning the execution of the corresponding TOSCA and Ansible scripts and the application runtime.

Fig. 2.3
An application operation has two categories create application component images and define abstract application deployment model. From each category, the data is stored and optimized in its component code and containers. TOSCA blueprint is generated and the data is stored. Finally, it analyses TOSCA and ansible code. The process repeats.

Design-time workflow for Application Operation Expert

Figure 2.3 shows the design time activities performed by Application Ops Experts to prepare the deployment of a complex application. First they focus on preparing the images of the application components by packaging them in proper execution containers; this activity is supported by the Image Builder. In parallel, they define the Abstract Application Deployment Model (AADM) through the SODALITE IDE. This task is an iterative activity that requires interaction with the SODALITE Knowledge Base and terminates when the user is satisfied by their AADM. When images and the AADM are saved in the Image Repository and Knowledge Base, respectively, the AOE generates the TOSCA blueprint. If needed, the optimization of component code and associated containers is performed as part of this phase. The resulting TOSCA blueprint is stored in any repository, e.g. Git, that offers a URI-based mechanism for identifying its elements. Finally, the TOSCA Blueprint, together with the associated Ansible playbooks (defined by the Resource Experts) are analyzed to assess the presence of possible problems and bug smells that, if revealed, bring the AADM back into the modeling phase.

Fig. 2.4
An application orientation starts from deployment orchestration to application startup. It has three steps: Autoscale, monitor, and refactor. Data from the monitor goes to autoscale and refactor. The data from refactor is stored in U R I data and goes to deployment orchestration or directly goes to deployment orchestration.

Runtime workflow for Application Operation Expert

Figure 2.4 describes the runtime activities that are overseen by AOEs. They are all automated, but their results can be inspected through proper dashboards. The process starts with the orchestration of a TOSCA blueprint and the associated Ansible Playbooks. The result of this step, when successful, is a complex application ready to start its execution. After execution starts, the continuous activities concerning monitoring, auto-scaling, and refactoring are performed. Refactoring can result in changes in the TOSCA blueprint that trigger a new deployment orchestration step. In this process, monitoring data are produced by the monitoring platform and exploited by the auto-scaling mechanism for short-term fine-tuning and by the refactoring for identifying longer term potential issues. TOSCA blueprints are retrieved and stored, when changed, in any suitable repository as already discussed in reference to the design time activities.

Fig. 2.5
A workflow for quality experts starts from execute benchmarks. The data from execute benchmarks goes to define optimization models. The data of optimization models are executed and the data is stored in the SODALITE knowledge base.

Workflow for Quality Expert

The Quality Expert is in charge of developing proper optimization models that constitute the inputs to the Application Optimizer (MODAK). He/she is assumed to run, externally to SODALITE, benchmarks to measure the characteristics of available resources. Based on these, he/she defines the optimization models based on the data acquired during the benchmark phase. The creation of Optimization Models is supported by the IDE, while the models are stored in the SODALITE Knowledge Base. Figure 2.5 provides an overview of the described workflow.

The last workflow associated with the usage of SODALITE concerns the activities carried out by the system administrator in charge of making the SODALITE platform available to other users. Given that this platform comprises multiple components, it is, by definition, a complex application. As such, its deployment and configuration have been automated through a proper TOSCA blueprint. This workflow is then completely automated.

Fig. 2.6
A network of the application orientation expert has three layers: Modelling, I a C, and Runtime. The modeling layer has I D E which has application, optimization, and resource models. Semantic suggestions are entered into the I a C layer for the verification process and semantic reasoners are entered into the runtime layer for monitoring.

Runtime workflow for Application Operation Expert

2.3 SODALITE Overall Architecture

Figure 2.6 shows the SODALITE platform architecture. It is organized in three layers, the Modelling layer, the Infrastructure as Code layer, and the Runtime layer.

The Modelling layer includes a set of SODALITE domain ontologies, resulted by the abstract modelling of the related domains (applications, infrastructure, performance optimization and deployment), and constituting the Semantic Knowledge Base. A dedicated middleware (Semantic Reasoner) is in charge of the population of data and the application of rule-based Semantic Reasoning. The IDE offers support for the final users for the design of deployment models.

The Infrastructure as Code Layer (IaC Layer) offers APIs and data to support the optimization, verification and validation process of both Resource Models (RM) and Abstract Application Deployment Models (AADM). Moreover, it prepares a valid and deployable TOSCA blueprint through the IaC Builder and offers the Platform Discovery Service, which supports the tasks of the Resource Expert by creating a valid TOSCA platform resource model to be stored into the SODALITE Knowledge Base.

The Runtime Layer of SODALITE is in charge of the (re)deployment of SODALITE applications into heterogeneous infrastructures, their monitoring and dynamic reconfiguration. It is composed of the following main blocks: (i) An Orchestrator that receives the IaC description of the application to be deployed or re-deployed and executes it by deploying the application components on the target infrastructure. (ii) A Monitoring system which enables the users to visualize multiple metrics and the refactoring mechanism to initiate any needed recovery action. (iii) A Refactoring component that identifies possibilities for increasing the efficiency of the system and proposes to the end users possible redeployment options.

SODALITE provides tools and methods to authenticate and authorize actions on API endpoints using open-source Identity Management and Secure Secret handling tools. While authorization is required—a single SODALITE endpoint can manage different infrastructures belonging to different domains. Apart from proper authentication and authorization of user actions, safe secret management across the whole deployment pipeline is also required and ensured by SODALITE.

As a basis for authorization the OAuth 2.0 protocol was chosen, which is the de-facto industry standard for authorization. As for IAM provider, SODALITE uses Keycloak—a popular and widely used open source tool which simplifies the creation of secure services with minimal coding for authentication and authorization. It allows wide customization of options exceeding the needs of SODALITE. Along with the basic authentication mechanism provided by Keycloak, SODALITE can also support such features as 2-factor authentication and seamless integration with third party identity providers like Google or GitHub.

Apart from properly authorising user’s actions, the Security Pillar handles also infrastructure secrets like RSA keys, tokens, and passwords. This involves two points to be addressed: Security of data in use and security of data at rest.

The first point is mitigated by properly handling the secrets across the whole pipeline: unencrypted information is not stored, security critical parts are not logged, users are managed on virtual containers that host the SODALITE components. For addressing the second point, Hashicorp Vault was chosen, which is probably the most widely used open source tool for secret management.

2.4 A Running Example: Snow

This section is dedicated to an overview of the Snow example that will be used in the following chapters in all cases we want to exemplify the usage of the SODALITE features.

Snow exploits the operational value of information derived from public web media contents to support environmental decision-making in a-snow dominated context. An automatic system crawls geo-located images from heterogeneous sources at scale, checks the presence of mountains in each photo, identifies individual peaks, and extracts a snow mask from the portion of the image denoting a mountain. Two main image sources are used: we crawl touristic webcams in the Alpine area and search Flickr for geo-tagged user-generated mountain photos in the Alpine region.

Fig. 2.7
Public photograph sources have user-generated photos and public webcam lists have webcam images. User-generated photographs have crawling and mountain relevance classification. Webcam images have crawling, filtering, and aggregation. Mountain peak identification, snow mask, and index computation are done with both user-generated and webcam images.

Snow: Foreseen pipeline

Both image types carry, explicitly or implicitly, information about the location where the image is taken, but require estimating the orientation of the camera during the shot, identifying the visible mountain peaks, and filtering out images not suitable for snow analysis (e.g., due to fog, rain etc.).

The two multimedia processing pipelines, shown in Fig. 2.7, share common steps but also have differences.

Pictures from Flickr tagged with a location corresponding to a certain mountainous region do not ensure the presence of mountains. For this reason, the presence of mountains in every photograph is estimated and the non-relevant photographs are discarded. The process to classify an image first computes a fixed-dimensional feature vector, which summarizes the visual content, and then provides it to a Support Vector Machine (SVM) classifier to determine whether the image should be discarded or not. A dataset of images annotated with mountain/no mountain labels is needed to train the model.

Outdoor webcams represent a valuable source of visual content. They expose a URL which returns the most recent available image. In this case, the resulting images need to be filtered by the weather conditions, since these can significantly affect short- and long-range visibility. Additionally, snow cover changes slowly over time, so that one measurement per day is sufficient; for this reason, an aggregation of the images obtained during the day is desirable.

The distance between the shooting location and the framed mountains can be very high (tens of KMs). The photo geotag only is not sufficient for the analysis of the mountains. It is necessary to determine which portions of the image represent which mountains, identify the geographical correspondence of each pixel: estimate whether it is a terrain surface or sky, what is the corresponding geographical area, what are its GPS coordinates, altitude and distance from the observer. Once an image is geo-registered, the portion of the image that represents the mountain area can be analysed and divided into snow and non-snow areas. Mountain Image Geo-registration (MIGR) is done by finding the correct overlap between the photograph and a 360-degree cylinder with a virtual mountain panorama, i.e., a synthetic image of the visible mountain skyline generated with a projection from DEM (Digital Elevation Model) data and from the camera shooting position.

A snow mask is defined as the output of a pixel-level binary classifier that, given an image and a mask M that represents the mountain area as inputs, produces a mask S that assigns each pixel of the mountain area a binary label denoting the presence of snow. Snow masks are computed using the Random Forest supervised learning classifier with spatio-temporal median smoothing of the output. To perform the supervised learning a dataset of images with an annotation at pixel level indicating if the pixel corresponds to the snow area is needed.

The pipeline produces a pixel-wise snow cover estimation from images, along with a GPS position, camera orientation, and mountain peak alignment. Thanks to the image geo-registration and orthorectification (using the associated topography data) it is possible to estimate the geographical properties of every pixel, such as its corresponding terrain area and altitude. Consequently, it is possible to compute the snow line altitude (the point above which snow and ice cover the ground) expressed in meters.

The virtual snow index for an image is defined as: \((x,y) | S(x,y) = 1 vsi(x,y)\), where vsi is a virtual snow index function that transforms a pixel position into a snow relevance coefficient and can be defined as \(vsi(x,y)=1\) and \(S(x,y) = 1\) indicates it will be calculated for each pixel that corresponds to the snow mask obtained in previous step.

2.5 Conclusion

In this chapter we have provided an overview of the SODALITE target users, of the workflows SODALITE supports for them and of the SODALITE architecture. We have also briefly described a case study, Snow, that will be used in the following chapters to exemplify specific aspects of our approach.

The individual components of the SODALITE toolset are presented in the following chapters. They are all available as open source softwareFootnote 1 and as containerized Docker images.Footnote 2