Keywords

figure a
figure b

1 Introduction

For the past decades, the problem of combining type theory and meta-program-ming has been in need for a solution (c.f. [7, 15, 18, 36, 47, 50, 57]). Given the solid and elegant foundations for describing proofs as programs provided by type theories, also supporting meta-programming allows us to think of proof generation as code generation. This opens up the possibility to support proof macros, domain-specific proof generators, proof transformations, and reasoning about meta-programs within the same language.

While support for meta-programming in existing proof assistants is common (e.g. [15, 18, 57, 61]), this is typically achieved via some unverified mechanisms like reflection, requiring significant engineering effort. Moreover, the interplay between these mechanisms and the core type theory is not well-understood, often breaks critical type-theoretic properties like confluence, and lacks theoretical guarantees like normalization. As a consequence, it is often not clear how we can reason about meta-programs themselves. Even guaranteeing that the generated code is well-typed and well-scoped is non-trivial. Hence this leads to a gap between implementations of meta-programming in proof assistants and their theoretical foundations.

Fig. 1.
figure 1

Layered style as a middle ground

Theoretical foundations that combine meta-programming with type-theory typically fall into two categories: the homogeneous style and the heterogeneous style. Homogeneous meta-programming uses a single language capable of meta-programming itself (depicted in Fig. 1a). To provide a logical, type-safe foundation in this style, Davies and Pfenning [17] give a modal \(\lambda \)-calculus with the \(\square \) modality. They use the modal type \(\square T\) to represent the code of type T. Having modal types allows us to differentiate on the type level meta-programs that manipulate code from regular programs in one unified language. Nanevski et al. [39] subsequently extend the modal \(\lambda \)-calculus [17] with contextual types, allowing meta-programming on open code. Nevertheless, the correspondence described by both systems only supports basic primitives like execution and composition of code, but does not suggest a way to support any form of intensional analysis. In fact, supporting intensional analysis in the homogeneous style while retaining properties like confluence and normalization has been fraught with difficulties (c.f. [48]). Most recently, Kavvos [33] notes that we can only soundly extend the modal \(\lambda \)-calculus with intensional analysis for closed code if we want to retain confluence. A significant step towards supporting pattern matching on open code in a homogeneous style is taken in Moebius [30]. Moebius is based on System F-style polymorphism. However, its pattern matching does not guarantee coverage. Therefore Moebius does not provide normalization.

In a heterogeneous system, we distinguish between the meta-language and the object language (illustrated by Fig. 1c). Recently, Kovács [36] adapts 2-level type theory (2LTT), originally conceived for homotopy type theory, to dependently typed meta-programming. Here, a dependently typed meta-language sits on top of a less expressive object language. However, this type theory does not support intensional analysis. In contrast, Cocon [47], another 2-level type theory following in the footsteps of previous work [17, 39], supports modeling open code and intensional code analysis. Though these heterogeneous systems are modular, this comes at a price: a definition in one level is not directly accessible or reused in the other level. Unlike homogeneous systems, both heterogeneous systems do not support execution of code. Moreover, the separation into two languages leads to two separate investigations of meta-theoretic properties for two languages and ultimately two separate normalization arguments. How to elegantly scale these languages to multiple layers is not obvious, or at least very tedious.

In this paper, we propose a novel layered style as a schema to combine meta-programming and type theory (see Fig. 1b) and to combine the advantages of homogeneous and heterogeneous styles. Specifically, our layered modal type theory achieves three features: a run primitive, which extracts a term of type A given code of type A for all A; a normalizing type theory; pattern matching on code, which is the most general form of intensional analysis. As a demonstration, we develop a layered modal simple type theory achieving these features. In this type theory, there are a fixed number of layers of languages. The type theory is uniform in the sense that all layers fundamentally share a common syntax for their languages and the same typing judgment as in the homogeneous style. Therefore, our layered system has a natural run primitive as all homogeneous systems. Furthermore, our layered system follows the matryoshka principle: the language at layer i is contained in its meta-language at layer \(i+1\). What is added to layer i at layer \(i+1\) is the ability to inspect and analyze code from the language at layer i. This matryoshka structure of layers of languages not only ensures uniformity in the syntax and the typing judgment of the type theory, but also provides extra flexibility in distinguishing computational behaviors at different layers. As a principle, we only allow \(\beta \) and \(\eta \) equivalence at the highest layer, so all lower layers are treated as static code which is only identified by its syntax. Layering allows us to encode different computational behaviors at different layers using the same set of equivalence rules. This is crucial to enable sound intensional analysis and establish normalization.

To introduce layering succinctly, we focus on a 2-layered modal simple type theory in this paper. In this 2-layered system, its core language at layer 0 is a simply typed \(\lambda \)-calculus (STLC). At layer 1, STLC is then extended with one layer of meta-programming with the \(\square \) modality. The meta-language at layer 1 can only manipulate and analyze code from layer 0, but not from its own layer. Following our previous discussion, we only allow computation on layer 1, and terms at layer 0 are treated as pure syntax. This allows us to cleanly define covering pattern matching on code and eventually leads to an elegant normalization proof using a presheaf model.

Summary of Contributions:

  1. 1.

    We develop a 2-layered modal type theory (Sec. 3) which supports running code (feature ). To prove normalization, we extend the classic presheaf model for STLC [5] to our type theory (Sec. 4). From this presheaf model, we extract its normalization algorithm that is complete and sound.

  2. 2.

    We extend the previous 2-layered modal type theory with pattern matching on code (Sec. 5). We adapt our previous presheaf model to support pattern matching on code and prove that the extracted algorithm is both complete and sound. Thus we achieve features and .

  3. 3.

    We outline three different dimensions to extend layered modal type theory in Sec. 6. In particular, we discuss extensions to richer systems like System F and Martin-Löf type theory. We also discuss how to extend the expressive power of the computational layer with additional operations, and how to scale our 2-layered system to n layers.

We believe that layering is versatile enough to be adapted to complex systems like System F and Martin-Löf type theory. As such, it provides a systematic way of supporting intensional analysis while retaining normalization. It is a significant step towards closing the gap between implementations that support meta-programming in practice and their theoretical foundations. Interested readers could find more details in our technical report [27] and our Agda code [28].

2 Example Programs in 2-layered Modal Type Theory

In this section, we show how to write and improve the well-known function in layered modal type theory by gradually introducing more features. In general, many common meta-programs including the function use only two layers.

2.1 A Layered Power Function

The function defined by [17, Sec. 3.4] is a classic meta-program and we can define it in our 2-layered type theory with the help of contextual types:

figure l

In the examples in this section, we use a front-end syntax similar to Haskell and Agda. For clarity, we abbreviate as numbers, e.g. is notation for . The return type of this meta-function is a contextual type . This type denotes code of type with an open variable of type . In general, the number of open variables is arbitrary. In the body, we recurse on the input number. If it is , then the generated code is just . The open variable is not used. In the case, we first perform the recursive call . The eliminator binds a new global variable to an open type . We say that has type with an open variable of type . A global variable is a placeholder for code. It remains visible under a constructor. Regular variables like , on the other hand, cannot directly participate in code construction, so they are hidden inside . When we refer to in , we must instantiate the open variable of . In this case, an identity substitution suffices. Now stands for the ’th power of and we obtain our goal by multiplying it with an extra . Our implementation of the function is almost as expected except for the dangling :

figure at

We would like to remove the ’s because it is the unit element of multiplication. We will make this improvement in the next subsection. Nevertheless, we can already run the current code, which is critical for a meta-programming system:

figure av

generates a regular function computing squares. We can also directly run the code with a specific argument:

figure aw

would substitute for and give , the square of .

2.2 Pattern Matching for Intensional Analysis

An easy way to improve the previous implementation is to pattern match on the resulting code and remove all occurrences of . However, supporting pattern matching on code in a type-theoretic setting has been notoriously difficult. Previous attempts in the homogeneous style fail to retain the normalization property. To illustrate, consider the intensional function [19, 33]. This function simply looks at the structure of a code and returns if this code is a function application, or otherwise. Note that ’s behavior purely depends on the syntactic structure of its argument. In our 2-layered system, this function can be implemented by a pattern matching on code:

figure bg

We use pattern matching to inspect the input code . In our first branch, we return if is some function application. Here, and are both pattern variables. We use question marks to distinguish pattern variables and constants, e.g. and which are the constructors of . This distinction is only necessary in the patterns, and we do not write a question mark when we refer to a pattern variable in the body of the branch. We also omit writing the local context in which the pattern is sensible because it is determined by the type of . The pattern variables and capture the code of the function and the argument respectively if is a function application. As they are not used, we could also have written instead. The other branches are captured by the wildcard and all return . Let us see how this function behaves:

figure bv

Kavvos [33] points out that Gabbay and Nanevski’s [19] evaluation of is not confluent. It is possible to evaluate the same program in different orders and obtain two different values. For some well-typed code and ,

figure bz

In the second execution, is evaluated first, and then the overall result is . In our system, this confluence issue is avoided by preventing the execution of until it is known what stands for. This treatment ensures that is stable under substitutions. Hence, the program only evaluates to . This is a subtle but critical design decision which ultimately enables sound intensional analysis and normalization. We explain more in Sec. 5.2.

With sound pattern matching on code, a simple arithmetic simplifier is implemented to remove the redundant ’s in the previous subsection:

figure ch

In the first case, we remove from the multiplication. In the second case, we recursively simplify the first factor. We know this is sufficient because only occurs in the leftmost factor. In the last case, we do not optimize. Since pattern matching is covering, we must either specify all cases or give a wildcard case. At last, we provide a wrapper function , where we invoke to simplify the code generated by :

figure cn

The function precisely does what we expect:

figure cp

This example shows that we have full control over code via pattern matching on code, while running still gives the same behaviors as .

3 A 2-Layered Modal Type Theory

In this section, we introduce a 2-layered modal type theory, which is simple yet powerful enough for many interesting programs like the unoptimized function in the previous example. This system provides a starting point and a basis for a clear understanding of the impact of layering on syntax and semantics. We build a semantic framework for 2-layered modal type theory which is further extended with pattern matching on code in Sec. 5.

2-layered modal type theory is defined as follows:

figure ct

We assume de Bruijn indices as our name representation for convenience but our development generalizes. We use natural numbers \(\texttt {Nat}\) as a base type. We can construct \(\textsf {zero}\) and \(\textsf {succ}\) of another \(\texttt {Nat}\). \(\textsf {rec}_{T}\ s\ (x~y.s')\ t\) is the recursor for \(\texttt {Nat}\), where t is the scrutinee, s is the base case and \(s'\) is the step case, where x is the predecessor and y is the result from the recursive call. As the recursor for natural numbers is standard, we leave its discussion in the technical report [27]. A function is introduced by \(\lambda \)-abstraction and can be applied to an argument. \({\square (\varGamma \vdash T)}\) is a contextual type. It stands for code open in context \(\varGamma \). The \(\texttt {box} \) constructor introduces terms of type \(\square (\varGamma \vdash T)\) and \(\texttt {letbox}\) is the eliminator for it. We defer our discussion on pattern matching on code to Sec. 5.

For layered systems, we keep track of as many contexts as the layers. These contexts are contained in a fixed-sized context array in the judgments. With two layers, a context array only has two contexts \(\varPhi ; \varGamma \). It hence defines a dual-context type theory. Following Pfenning and Davies [17, 42], \(\varGamma \) is referred to as a local context and its variables are local variables, ranged over by x and y. \(\varPhi \) is a global context and contains global variables, ranged over by u. For a global binding \(u : (\varGamma \vdash T)\), we say that u represents code of type T with an open context \(\varGamma \).

When writing meta-programs, we conceptually distinguish between programs that are dynamic and compute, and code that is static and syntactic. In a homogeneous system, this distinction is captured by types, i.e. program t has type T while code has type \(\square (\varGamma \vdash T)\). However, a term t itself does not provide information about whether it is inside of a \(\texttt {box} \) (hence treated as code), or outside of a \(\texttt {box} \) (hence a program). For example, only knowing that \(\textsf {succ }\textsf {zero}\) has type \(\texttt {Nat}\) does not reveal whether it is a piece of code or a program. The typing judgment for homogeneous systems like \(\varPsi ; \varGamma \vdash t : T\) [17, 42] only provides typing information, and does not a priori determine whether t should be considered as code or as a program. Even though one major advantage of a homogeneous system is to use the same language for code and programs, this lack of information is the critical reason for the challenges that we face when combining type theory and intensional analysis.

Layered modal type theory makes the distinction between code and programs explicit. In the typing judgment \(\varPsi : \varGamma \vdash _{i} t : T\), we use the subscript \(i \in [0, 1]\) to identify the layer at which t is well-typed. This judgment states that the term t has type T at layer i. When \(i = 0\), t is code and does not compute, and when \(i = 1\), t is a program and therefore has rich reduction behaviors. There are three important implications of layering:

  1. 1.

    we can control what types are valid at each layer,

  2. 2.

    we can control what terms are well-typed at each layer, and

  3. 3.

    we can control what terms are equivalent at each layer.

Fig. 2.
figure 2

Typing and equivalence judgments

In the first part, we control the validity of types using the validity predicate. In the rules below, we rule out the use of \(\square \) at layer 0 and limit layer 1 to at most one layer of \(\square \):

figure cu

This validity predicate only limits the depth of nested \(\square \)s. Therefore, holds although it has two \(\square \)s. does not hold, because it has two nested layers of \(\square \). Moreover, the validity judgment only provides an upper bound, so both \(\texttt {Nat}\ \texttt {wf}^{0}\) and \(\texttt {Nat}\ \texttt {wf}^{1}\) hold. This predicate generalizes to \(\varPsi \ \texttt {wf}^{i}\) and \(\varGamma \ \texttt {wf}^{i}\) by requiring all types in \(\varPsi \) and \(\varGamma \) to comply with the predicate. The validity predicates satisfy the lifting property:

Lemma 1 (Type lifting)

[Type lifting] If \(T\ \texttt {wf}^{0}\), then \(T\ \texttt {wf}^{1}\); if \(\varGamma \ \texttt {wf}^{0}\), then \(\varGamma \ \texttt {wf}^{1}\).

The lifting property characterizes the matryoshka principle for types and the diagram in Fig. 1b, and says that types and contexts at a lower layer are included at a higher layer.

The fact that the validity predicate only allows \(\square \) at layer 1, suggests that its constructor \(\texttt {box} \) and eliminator \(\texttt {letbox}\) should also only appear at layer 1, while terms of types \(\texttt {Nat}\) and functions should appear at both layers. Having a layer in the typing judgment allows us to cleanly restrict valid terms at each layer:

figure cx

\(\texttt {box}\ t\) is well-typed at layer 1, only if the code t is well-typed at layer 0. Now a clear line is drawn between code and programs: code lives at layer 0 while programs live at layer 1. The rule for \(\texttt {letbox}\) is only available at layer 1. The body is type-checked in an extended global context with a new global variable. This global variable is a placeholder for the code computed by s.

The rules are given in Fig. 2. The rules for terms coming from STLC, i.e. \(\textsf {zero}\), \(\textsf {succ}\), \(\lambda \) and function applications, are standard and valid at both layers. Given a term of type \(\texttt {Nat}\) or a function, we know whether it is code or a program by checking the layer it lives at. Extra validity predicates are added to the premises of the local variable rule and the \(\textsf {zero}\) rule to enforce the coherence between terms and types at layer i. Notice that terms from STLC can extend the local context via \(\lambda \) regardless of layers and they can only refer to but not introduce global variables. When referring to a global variable u, a local substitution \(\delta \) is needed to replace all variables in the local context \(\varDelta \), as specified by the superscript. The coherence between terms and types requires terms at layer i to have types at the same layer. This criterion is formulated by the following lemma:

Lemma 2 (Syntactic validity)

[Syntactic validity] If \(\varPsi : \varGamma \vdash _{i} t : T\), then \(\varPsi \ \texttt {wf}^{0}\), \(\varGamma \ \texttt {wf}^{i}\) and \(T\ \texttt {wf}^{i}\) for \(i \in [0, 1]\).

\(\varPsi \) is always valid at layer 0 because it is a context for code from layer 0.

The layer i in the typing judgment \(\varPsi : \varGamma \vdash _{i} t : T\) effectively leads to the encapsulation of two languages in the same system. When \(i = 0\), only terms in STLC are well-typed, so we work in STLC, and hence \(\varPsi \ \texttt {wf}^{0}\) and \(\varGamma \ \texttt {wf}^{0}\) hold. The typing rules ensure that we cannot write any meta-program at this layer and no \(\square \) is involved. When \(i = 1\), one layer of \(\square \) is allowed in addition to STLC. At this layer, we can not only write regular STLC programs, but also write meta-programs that generate STLC programs through \(\square \). Thus, we work with a meta-language and an extension of STLC. In this case, \(\varPsi \ \texttt {wf}^{0}\) and \(\varGamma \ \texttt {wf}^{1}\) hold. Using layers, we fit both code (layer 0) and programs (layer 1) in a unified set of typing rules and arrive at a middle ground between homogeneous and heterogeneous styles. Code at layer 0 can be lifted to layer 1 and turned into a program. The resulting program is well-typed, due to the following lemma:

Lemma 3 (Term lifting)

[Term lifting] If \(\varPsi : \varGamma \vdash _{0} t : T\), then \(\varPsi : \varGamma \vdash _{1} t : T\).

The lifting property of well-typed terms has two indications. A language at layer 0 is contained at layer 1. This is the critical intuition of the matryoshka principle and the idea of layering. Though a term at layer 0 is code and static, its computational behaviors are recovered by lifting it to layer 1. The second point is what guarantees a universal run primitive for all code that is crucial for a meta-programming system and achieves the feature in Sec. 1. The term lifting behavior can be trigger by the \(\beta \) rule for \(\square \). For some well-typed terms t and s at layer 0 and a local substitution \(\delta \) that does not refer to u:

figure db

Due to the \(\beta \) rule, u is replaced by \((\lambda x. t)~s\). The layer-0 term \((\lambda x. t)~s\) is then lifted to layer 1 on the right hand side and computes. Thus its computational behavior is revived and it is further reduced to t[s/x].

At last, due to layering in the typing rules, the equivalence rules are also layered. There are three groups of equivalence rules: the PER rules which include symmetry and transitivity, congruence rules which are naturally derived from the typing rules, and the computation rules which describe \(\beta \) and \(\eta \) equivalence. The PER and congruence rules apply to all layers, but the computation rules only apply to layer 1. We show all the \(\beta \) and \(\eta \) rules at the bottom of Fig. 2. The PER and congruence rules are standard. [s/x] and [s/u] are local and global substitutions, respectively. They substitute s for x and for u everywhere as expected. The lack of computation at layer 0 ensures that terms at layer 0 are identified only by their syntactic structures and indeed behave as code:

Lemma 4 (Static code)

[Static code] If \({\varPsi ; \varGamma \vdash _{0} t \approx s : T}\), then \(t = s\).

This lemma justifies our treatment of terms at layer 0 as code and prepares for the addition of pattern matching on code in Sec. 5.

Finally, we specify global substitutions between global contexts. Global substitutions are defined in the usual way as lists of terms:

figure dc

Due to layering, all terms in a global substitution must live at layer 0:

figure dd

Given a global substitution \(\sigma \), we can apply it to a term:

figure de

where \(\delta [\sigma ]\) applies \(\sigma \) to all terms in \(\delta \). Global substitutions do not handle local variables, so in the case of local variables we just return x, while in the case of global variables we look up \(\sigma \) and apply the globally substituted local substitution \(\delta [\sigma ]\) to the result of the lookup. \(\sigma \) propagate in most cases recursively. In the case of \(\texttt {letbox}\), we extend the substitution and apply \(\sigma , u/u\) to the body t. Global substitutions compose and have identity. We write \(\varPsi \vdash {\textsf {id}_\varPsi } : {\varPsi }\), and often omit the subscript whenever it can be inferred.

4 Presheaf Model and Normalization by Evaluation

We now establish normalization by evaluation (NbE) [1, 11, 37] of the 2-layered modal type theory. NbE is a technique to establish the normalization property. An NbE proof usually proceeds in two steps: first, we evaluate terms of a type theory into some chosen domain; second, normal forms are extracted from values in this domain. Our chosen domain is a presheaf category. A presheaf category is a functor category from some base category to the category of sets. A carefully chosen base category leads to an intuitive normalization proof. In this section, we use the category of weakenings as the base category. The presheaf model shown here is a moderate extension of the classic presheaf model of STLC [5].

Fig. 3.
figure 3

Global and local weakenings

4.1 Category of Weakenings

In the category of weakenings, the objects are the dual contexts and morphisms are weakenings between dual contexts. Weakenings between dual contexts are just tuples of global and local weakenings. They individually are defined in the same way as weakenings in STLC as below:

figure df

Their typing rules are virtually identical with the only difference in the validity predicates (Fig. 3). The q constructor extends a weakening with the same type, while p actually weakens the context. \(\varPsi \Longrightarrow _g \varPhi \) denotes global weakenings and \(\varGamma \Longrightarrow _l \varDelta \) denotes local weakenings. Then weakenings of dual contexts \(\gamma ;\tau : \varPsi ; \varGamma \Longrightarrow \varPhi ; \varDelta \) are tuples of global and local weakenings. Both global and local weakenings have composition and identity. We write \(\textsf {id}_\varPsi \) and \(\textsf {id}_\varGamma \) for the identity global and local weakenings, respectively. We often omit the subscript when it can be inferred from the context. Identity and composition of weakenings \(\varPsi ; \varGamma \Longrightarrow \varPhi ; \varDelta \) are defined pairwise. We verify that dual contexts and weakenings form a category, which is referred to as . This is the base category that we will be working with. We sometimes also need to work with , the category of global contexts and global weakenings.

Fig. 4.
figure 4

Interpretations of types, and global and local contexts

4.2 Presheaf Model and Interpretations

In this section, we define the normalization algorithm with as our base category. The algorithm normalizes terms to their \(\beta \eta \)-normal forms, which are defined as follows:

figure dj

Notice that \(\texttt {box}\ t\) is already normal for any t. This is expected because \(\texttt {box}\ t\) regards t as static code so t cannot be reduced. These definitions induce the sets of well-typed normal and neutral forms:

figure dk

The sets only capture terms at layer 1 due to the lack of reductions at layer 0. \(\textsf {Nf}^T\) and \(\textsf {Ne}^T\) then are induced presheaves mapping dual contexts to the sets of normal and neutral forms, respectively.

Next we give the interpretation of types. The interpretation of function types is presheaf exponentials derived from the Yoneda lemma with naturality:

figure dl

We define the interpretations of types, and global and local contexts in Fig. 4. Both \(\texttt {Nat}\) and \(\square (\varGamma \vdash T)\) are interpreted as their presheaves of normal forms. In particular, \(\llbracket \square (\varGamma \vdash T) \rrbracket \) c. This case effectively interprets \(\square (\varGamma \vdash T)\) as the code of T open in \(\varGamma \). Based on the definition, two possible kinds of terms in \(\textsf {Nf}^{\square (\varGamma \vdash T)}\) are either neutral or of the form \(\texttt {box}\ t\). In the latter case, we have gained access to the syntax of t, permitting more complex operations like pattern matching on code.

The interpretation of global contexts is layered. At layer 1, it is the presheaf of global substitutions, containing code at layer 0 awaiting to be evaluated. At layer 0, it is the Hom set of , i.e. the presheaf of global weakenings. This definition is motivated technically to ensure the naturality of evaluation of terms to be defined shortly. We let \(\sigma \) to range over \(\llbracket \varPhi \rrbracket ^i_{\varPsi }\) when i is unknown. Local contexts are interpreted as iterated products of values as usual, where \(*\) is the unique element of a chosen singleton set. A dual context is interpreted pairwise:

$$ \llbracket \varPhi ;\varDelta \rrbracket ^i_{\;\varPsi ;\varGamma } := \llbracket \varPhi \rrbracket ^i_\varPsi \times \llbracket \varDelta \rrbracket _{\;\varPsi ;\varGamma } $$

All interpretations above are functors:

Lemma 5

\(\llbracket T \rrbracket \), \(\llbracket \varPhi \rrbracket ^i\), \(\llbracket \varDelta \rrbracket \) and \(\llbracket \varPhi ;\varDelta \rrbracket ^i\) are presheaves. \(\llbracket \varPhi \rrbracket ^i\) is from .

If \(a \in \llbracket T \rrbracket _{\;\varPhi ; \varDelta }\) and \(\gamma ; \tau : \varPsi ; \varGamma \Longrightarrow \varPhi ; \varDelta \), we write \(a[\gamma ;\tau ]\) for the functorial action of \(\gamma ;\tau \) on a. We generalize this notation to other functors.

Fig. 5.
figure 5

Definitions of reification, reflection and evaluation

Finally, we define the evaluation functions, interpreting terms as natural transformations between presheaves. This interpretation relies on two other natural transformations, reification and reflection, which map \(\textsf {Ne}^T\) to \(\llbracket T \rrbracket \) and \(\llbracket T \rrbracket \) to \(\textsf {Nf}^T\), respectively. All four natural transformations are defined in Fig. 5. Since \(\texttt {Nat}\) and \(\square (\varGamma \vdash T)\) are interpreted as presheaves of normal forms, their cases in reification and reflection are just identities. The case for functions is defined in the same way as in STLC.

Our evaluation is a moderate extension of the evaluation of STLC [5]. The evaluation function is layered because the type theory itself is layered. The cases overlapping with STLC are identical, so we only discuss the modal cases. The \(\texttt {box}\ t\) case is only available at layer 1. In this case, we directly propagate \(\sigma \) under \(\texttt {box} \). In the \(\texttt {letbox}\) case, we first evaluate s. Given \(\llbracket s \rrbracket ^1_{\;\varPsi ;\varGamma } \in \llbracket \square (\varGamma \vdash T) S \rrbracket ^1_{\;\varPsi ;\varGamma } = \textsf {Nf}^{\square (\varGamma \vdash T) S}_{\;\varPsi ;\varGamma }\), this evaluation has two possible results: it returns either a \(\texttt {box}\ s'\), or a neutral v. In the first case, we just recurse with \(\sigma \) extended with \(s'\) for u. In the second case of \(\texttt {letbox}\), some neutral v blocks the evaluation, so we can only recurse on the body t with u as is and with \(\sigma \) and \(\rho \) properly weakened. To obtain a \(\llbracket T \rrbracket _{\;\varPsi ;\varGamma }\), we reify the evaluation of t and obtain a normal form, using which we obtain a neutral of \(\texttt {letbox}\). A reflection of this neutral gives us a \(\llbracket T \rrbracket _{\;\varPsi ;\varGamma }\).

The interpretation of global variables is the most interesting. When \(u^\delta \) is referred to at layer 1, we are evaluating some code and turning it into a program, i.e. running it. We retrieve the code by looking up u in \(\sigma \), and continue the evaluation at layer 0 with an environment obtained by evaluating \(\delta \). Notice that the layer decreases so the interpretation is well-founded regardless of the size of \(\sigma (u)\). The evaluation of a local substitution recursively evaluates all terms in the local substitution. If we refer to \(u^\delta \) at layer 0, then u should stay neutral. Moreover, the evaluation function is required to return a natural transformation. Both requirements lead to the interpretation of global contexts as Hom set of at layer 0, since a weakened global variable is still a global variable and neutral. We first normalize \(\delta \) by evaluating and then reifying it (\(\downarrow ^{\varDelta '}_{\;\varPsi ; \varGamma }\)) and obtain a \(\llbracket T \rrbracket _{\;\varPsi ;\varGamma }\) by reflection. Last, reification, reflection and evaluation are all natural transformations:

Lemma 6 (Naturality)

[Naturality] If \(\gamma ;\tau : \varPsi ';\varGamma ' \Longrightarrow \varPsi ;\varGamma \),

  • if \(a \in \llbracket T \rrbracket _{\;\varPsi ;\varGamma }\), then \(\downarrow ^T_{\;\varPsi ;\varGamma }(a)[\gamma ;\tau ] = \downarrow ^T_{\;\varPsi ';\varGamma '}(a[\gamma ;\tau ])\);

  • if \(v \in \textsf {Ne}^T_{\;\varPsi ;\varGamma }\), then \(\uparrow ^T_{\;\varPsi ;\varGamma }(v)[\gamma ;\tau ] = \uparrow ^T_{\;\varPsi ';\varGamma '}(v[\gamma ;\tau ])\);

  • if \(\varPhi : \varDelta \vdash _{i} t : T\) and \(\sigma ;\rho \in \llbracket \varPhi ;\varDelta \rrbracket ^i_{\;\varPsi ;\varGamma }\), then \(\llbracket t \rrbracket ^i_{\;\varPsi ;\varGamma }(\sigma ;\rho )[\gamma ;\tau ] = \llbracket t \rrbracket ^i_{\;\varPsi ';\varGamma '}((\sigma ;\rho )[\gamma ;\tau ])\).

The NbE algorithm is given by composing the interpretations:

Definition 1

A normalization by evaluation algorithm given \(\varPsi : \varGamma \vdash _{1} t : T\) is

$$\begin{aligned} {} & {} \textsf {nbe}^T_{\;\varPsi ;\varGamma }(t) : \textsf {Nf}^T_{\;\varPsi ;\varGamma } \\ {} & {} \textsf {nbe}^T_{\;\varPsi ;\varGamma }(t) := \downarrow ^T_{\;\varPsi ;\varGamma } (\llbracket t \rrbracket ^1_{\;\varPsi ;\varGamma }(\uparrow ^{\varPsi ;\varGamma })) \end{aligned}$$

where \(\uparrow ^{\varPsi ;\varGamma } \in \llbracket \varPsi ;\varGamma \rrbracket ^1_{\;\varPsi ;\varGamma }\) is a tuple of the identity global substitution and the identity environment.

This algorithm is correct due to the following two theorems:

Theorem 1 (Completeness)

[Completeness] If \(\varPsi ; \varGamma \vdash _{1} t \approx t' : T \), then \(\textsf {nbe}^T_{\;\varPsi ;\varGamma }(t) = \textsf {nbe}^T_{\;\varPsi ;\varGamma }(t')\).

Theorem 2 (Soundness)

[Soundness] If \(\varPsi : \varGamma \vdash _{1} t : T\), then \(\varPsi ; \varGamma \vdash _{1} t \approx {\textsf {nbe}^T_{\;\varPsi ;\varGamma }(t)} T\).

The completeness theorem states that equivalent terms have equal normal forms, so we can compare the syntactic equality between normal forms to decide whether two terms are equivalent. The soundness theorem states that a well-typed term has and is equivalent to its normal form. Notice that the theorems are about layer 1 because only terms at this layer compute. In the remainder of this section, we outline only the soundness proof. For complete details, please refer to our technical report [27].

4.3 Soundness

The soundness theorem is established via gluing models, which relate syntactic terms with semantic values. In our 2-layered system, we need two layers of gluing models, which reflect the fact that we are actually operating in two languages. For a gluing relation R, we write \(a \sim b \in R\) to denote \((a, b) \in R\).

Layer-0 Gluing Model We begin with the gluing model for natural numbers. It recursively relates a term t and a normal form of type \(\texttt {Nat}\). This gluing relation applies for both layers:

figure dp

At layer 0, for all \(T\ \texttt {wf}^{0}\), its gluing model is:

$$\begin{aligned} & \qquad \quad \llparenthesis T \rrparenthesis ^0_{\;\varPsi ;\varGamma } \subseteq \textsf {Exp}\times \llbracket T \rrbracket _{\;\varPsi ;\varGamma } \\ & \qquad \llparenthesis \texttt {Nat} \rrparenthesis ^0_{\;\varPsi ;\varGamma } := \texttt {Nat}_{\;\varPsi ;\varGamma } \\ & \llparenthesis S \longrightarrow T \rrparenthesis ^0_{\;\varPsi ;\varGamma } := \{(t, a) \;|\;\forall \ \gamma ; \tau : \varPhi ; \varDelta \Longrightarrow \varPsi ;\varGamma , s \sim b \in \llparenthesis S \rrparenthesis ^0_{\;\varPhi ;\varDelta } \,.\\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\, t[\gamma ; \tau ]\ s \sim a(\gamma ; \tau , b) \in \llparenthesis T \rrparenthesis ^0_{\;\varPhi ;\varDelta } \} \end{aligned}$$

does not have a case for \(\square \) due to \(T\ \texttt {wf}^{0}\). The function case requires that the results of function applications remain related for all weakenings and all related arguments. The gluing between local substitutions and evaluation environments is defined by using to relate terms and values pairwise.

Definition 2

We define the semantic judgment at layer 0:

figure dt

The semantic judgment at layer 0 only universally quantifies over global weakenings.

Layer-1 Gluing Model The reason why we must define the layer-0 gluing model first is that we refer to \(\varPsi ; \varGamma \vDash _{0} t : T\) in our layer-1 model. The semantics of \(\square (\varDelta \vdash T)\) is given in terms of the semantic judgment at layer 0:

figure du

In the first rule, t is related to \(\texttt {box}\ s\) and s is a semantically well-typed term at layer 0. The premise \({\varPsi ; \varDelta \vDash _{0} s : T}\) is necessary when we prove the semantic typing rule for \(\texttt {letbox}\). Without it, we will not able to maintain the semantic well-formedness of global substitutions during evaluation and in the semantic judgment at layer 1. The details are in our technical report. The gluing model at layer 1 for \(T\ \texttt {wf}^{1}\) is now defined as:

figure dv

Compared to the layer-0 model, the layer-1 model only has an extra case . The other two cases are just the same:

Lemma 7

If \(T\ \texttt {wf}^{0}\), then .

This lemma semantically describes the matryoshka principle, witnessing the subsumption of layer 0 into layer 1. The semantic judgment at layer 1 is universally quantified over a semantic global substitution defined below:

figure dy

We define the semantic judgment at layer 1:

figure dz

The fundamental theorem is established by proving all semantic typing rules:

Theorem 3 (Fundamental)

[Fundamental] If \(\varPsi : \varGamma \vdash _{i} t : T\), then \({\varPsi ; \varGamma \vDash _{i} t : T}\).

If \(\varPsi : \varGamma \vdash _{i} \delta : \varDelta \), then \(\varPsi ; \varGamma \vDash _{i} \delta : \varDelta \).

5 Supporting Pattern Matching on Code

In the previous section, we have achieved feature and partly feature . In this section, we extend the previous system with pattern matching on code, so all features are concluded. We adapt the presheaf model and show that the normalization algorithm remains complete and sound. We introduce a creative semantics in the soundness proof in order to justify pattern matching on code.

5.1 Extension of Pattern Matching

In this section, we extend our previous 2-layered modal type theory with pattern matching on code as follows.

figure ec

We extend the system with another elimination form of \(\square (\varGamma \vdash T)\), pattern matching \((\textsf {match}\ t\ \textsf {with}\ \overrightarrow{b})\), where \(\overrightarrow{b}\) is a list of all possible branches of t. The branches only need to match terms in STLC because pattern matching is only available at layer 1 and the scrutinee is code from layer 0. We do not directly support nested patterns like \((\lambda y. ?u)~?u'\) to keep the system simple, but they can be encoded as nested pattern matchings. Supporting any useful general recursor (e.g. [29, 47]) would require context variables, which abstract over local contexts, and type polymorphism. We see these extensions orthogonal to layering and leave it to future work.

Fig. 6.
figure 6

Adjusted rules with contextual types

Further modifications to the typing and equivalence rules are in Fig. 6. We omit the case for \(\textsf {rec}\) for conciseness. The typing rule for \(\textsf {match} \) uses the judgment \(\varPsi : \varGamma \vdash _{1} {\overrightarrow{b}}{\varDelta \vdash T \Rightarrow T'}\). This judgment enumerates all possible branches based on the type of the scrutinee. This guarantees coverage of pattern matching, i.e. that \(\overrightarrow{b}\) is indeed a list of all possible branches for a given scrutinee of type \(\square (\varDelta \vdash T)\).

All typing rules for individual branches are similar. For example, if the pattern is \(\lambda x.?u\), then u captures the body of some \(\lambda \). The branch body t is checked with u bound to \((\varDelta , x : S \vdash T)\), which has a larger local context than \(\varDelta \) which we begin with. If the branch matches a function application, our premise requires t is well-typed for all \(S\ \texttt {wf}^{0}\). This universal quantification should be read as a higher-order derivation that applies for all \(S\ \texttt {wf}^{0}\) (see also [60]) and where we keep S abstract as a parameter.

The bottom of Fig. 6 are the \(\beta \) rules for pattern matching. Based on the structure of the scrutinee, we dispatch to the right branch and propagate instantiations for pattern variables via global substitutions to the bodies. Notations like \(\overrightarrow{b}(\textsf {succ }s)\) denote the lookup of \(\overrightarrow{b}\) based on a given shape. For example, \(\overrightarrow{b}(\textsf {succ }s) = \textsf {succ }?u \Rightarrow t\) means that we look up \(\textsf {succ }s\) in \(\overrightarrow{b}\), and find the branch \(\textsf {succ }?u \Rightarrow t\). Then s is meant to substitute u in t. This lookup is guaranteed to succeed because \(\varPsi : \varGamma \vdash _{1} {\overrightarrow{b}}{\varDelta \vdash T \Rightarrow T'}\) is covering.

5.2 Neutral Forms

Careful readers might have noticed that in our grammar of branches, we do not have a case for global variables, nor do we have a \(\beta \) rule for pattern matching on \(\texttt {box}\ u^\delta \). So what should \(\textsf {match}\ \texttt {box}\ u^\delta \ \textsf {with}\ \overrightarrow{b}\) be reduced to? The answer might be surprising: this term in fact is blocked. Previously, we mentioned a concern about raised by Kavvos [33]. His subsequent analysis concludes that sound intensional operations can only act on globally and locally closed code. This restriction is clearly too strong. After looking into the analysis, we see that this conclusion is based on the assumption that intensional operations reduce on \(\texttt {box}\ u^\delta \), which leads to the strong restriction. \(\textsf {match}\ \texttt {box}\ u^\delta \ \textsf {with}\ \overrightarrow{b}\) should not reduce, just for the same reason \(\textsf {match}\ x\ \textsf {with}\ \overrightarrow{b}\) should not. They are both waiting for substitutions to supply an actual code to unblock the evaluation. Their only difference is that they act on different substitutions. This observation leads to a renewed definition of neutral forms:

figure ee

The definition of normal forms, described by w, remains the same. To obtain \(\beta \eta \) normal forms, all branches should be normalized. If u is a scrutinee of a \(\textsf {match} \), its local substitution stays as is because it is considered as code. This adjustment is subtle but critical to give a sound presheaf model.

Moreover, notice that does not get blocked. This difference in computational behaviors is due to different purposes of two elimination forms. \(\texttt {letbox}\) is primary for code composition and the execution of code, while pattern matching focuses on intensional analysis of code. For this reason, we include both \(\texttt {letbox}\) and pattern matching as elimination forms. They coexist perfectly at layer 1 because our core theory at layer 0 is unaffected. Without layering, both \(\texttt {letbox}\) and pattern matching are available everywhere, including inside of \(\texttt {box} \), which causes all sorts of complex interactions and makes the computational behavior of the whole type theory difficult to control. This is why former systems based on [17] are so difficult to extend with intensional analysis in a controlled manner. Now we have introduced pattern matching on code and achieved feature outlined in Sec. 1. In the rest of this section, we fix the normalization proof and justify that this system is a proper type theory.

5.3 Adjusting the Presheaf Model

Since we only add an elimination form, we simply extend the model in Sec. 4. The adjustments are shown in Fig. 7. Two additional functions are defined: first, the \(\textsf {match} \) function dispatches evaluation to the proper branch based on the input code and evaluates the body with a global substitution and an evaluation environment; second, \(\textsf {nfbranch} \) normalizes the body of a branch and obtains a normal branch. Applying \(\textsf {nfbranch} \) to \(\overrightarrow{b}\) normalizes all branches in \(\overrightarrow{b}\). \(\textsf {nfbranch} \) is invoked when we normalize a pattern matching on some neutral code.

Let us consider the \(\textsf {match} \) case. We first evaluate the scrutinee. If the result is a neutral term, then we simply invoke \(\textsf {nfbranch} \) to normalize all branch bodies, and then use reflection to obtain a \(\llbracket T \rrbracket _{\;\varPsi ;\varGamma }\). Each case in \(\textsf {nfbranch} \) proceeds similarly. The evaluation of the body continues with the global substitution extended with pattern variables. The evaluated body is then reified to a normal form and thus the resulting branch is also normal. If the result is \(\texttt {box}\ s\), then we match the code s accordingly with a branch and evaluate the body. This is done by calling the \(\textsf {match} \) function. Based on the shape of s, the \(\textsf {match} \) function looks up \(\overrightarrow{b}\) and extends \(\sigma \) accordingly before evaluating the body. For example, if s is a \(\lambda \), then the branch \(\lambda x. ?u \Rightarrow t\) is picked, and t is evaluated. The lookup of \(\overrightarrow{b}\) must succeed because our pattern matching is covering. However, if s is just a global variable, based on the previous discussion on neutral forms, we must block the evaluation and only normalize the branches. The case forms a neutral form and is essentially the same as if the scrutinee is evaluated to a neutral.

Fig. 7.
figure 7

Adjustments to the presheaf model

The presheaf interpretation gives a semantic explanation of how layering enables sound pattern matching on code and why it is difficult in purely homogeneous systems. A term of type \(\square (\varGamma \vdash T)\) has two different uses: it either stands for code that will be run (due to \(\texttt {letbox}\)) or it stands for code that will be analyzed (due to pattern matching). In the former case, it is evaluated to a natural transformation in the semantics while in the latter, only its syntactic information is needed. Moreover, these two uses are not mutually exclusive. A program might use both semantic and syntactic information of the same code. To support pattern matching on code, we must maintain both semantic and syntactic information and therefore the evaluation of code must be postponed. In our setting, this evaluation only happens when we evaluate a global variable at layer 1. The evaluation function evaluates the code represented by the global variable and maintains its well-foundedness by decreasing the layer from 1 to 0. Meanwhile, in a homogeneous system without layers, it is no longer clear how to give a well-founded evaluation of a global variable due to the lack of proper measure if intensional analysis is supported.

As we will see very shortly, the intuition above based on two different uses of code must be formalized in the gluing model in order to establish a soundness proof, giving a formal account for the importance of layering.

5.4 Soundness

Recall that in Sec. 4.3, we need a 2-layered model, where the layer-1 model refers to the semantic judgment at layer 0 to support \(\texttt {letbox}\). The semantic judgment at layer 0 is defined as a universal quantification over global weakenings and the gluing between local substitutions and evaluation environments:

figure eh

This definition taken from Sec. 4.3, unfortunately, cannot support the semantic rule for pattern matching. Consider some . To prove that pattern matching on code is semantically sound, we perform an analysis on the structure of t s. But we are stuck, because we cannot derive or for some S. In general, the semantic information of subterms is lost. To support pattern matching on code, our semantic judgment must maintain both the syntactic structure of the code and the semantic information of all subterms. Therefore, our semantic judgments at layer 0 become inductively defined (Fig. 8). These rules are essentially just the typing rules with some extra premises. The inductive definition makes sure that the semantic information for all subterms are maintained.

Fig. 8.
figure 8

Layer-0 semantic judgment

Finally, we refer to \(\varPsi ; \varGamma \vDash _{0} t : T\) when we define the gluing relation for \(\square (\varGamma \vdash T)\) at layer 1. This allows us to inspect the syntactic structure of t during an evaluation of pattern matching and access its semantic information at the same time. We refer readers to our technical report [27] for the proofs of the semantic rules for pattern matching. At layer 1, we use the new semantic judgment \(\varPsi ; \varGamma \vDash _{0} t : T\) at layer 0 to define the semantic judgment for global substitutions \(\varPhi \vDash \sigma : \varPsi \), and then define the semantic judgment for terms and local substitutions:

figure em

By proving and then instantiating the fundamental theorems, we obtain the soundness proof.

6 Future Extensions to Layered Systems

We have shown that 2-layered modal type theory supports intensional analysis and retains normalization. In this section, we build on our previous development and describe three possible extensions of layered systems as future work.

6.1 Extending to Complex Type Systems

In this paper, so far we only focused on simple types. Layering, however, is a powerful idea that, we believe, scales naturally. A natural extension is to consider System F, the foundation for many practical programming languages like Haskell and ML family, as the core language. Haskell and ML communities have expressed strong interest in meta-programming [34, 35, 49, 53, 58, 59, etc.]. Layering provides a simple solution to this problem. In 2-layered System F, we replace validity of types with well-kindedness of types: \(\varPsi : \varGamma \vdash _{1} t : T : *\). Following the matryoshka principle, at layer 0, we operate in System F, while at layer 1, we work in a meta-language extending System F with one layer of \(\square \). We hope that 2-layered System F not only guarantees the well-scopedness and well-typedness of code, but is also normalizing, following our development here.

Besides System F, we are also interested in using Martin-Löf type theory (MLTT) as the base language. 2-layered MLTT would provide a foundation for tactic languages and meta-programming in proof assistants like Coq, Agda and Lean. Following our previous development, we expect that 2-layered MLTT enables the reuse of all definitions from layer 0 at layer 1, and the guarantee of well-scopedness and well-typedness of all code. Since in MLTT types are also terms, we simply reuse contextual types \(\square (\varGamma \vdash {\texttt {Type}}) \) for code of types.

One challenge we expect from extending MLTT with layering is the semantics of code. For example, when we pattern match on code \(t : (\lambda x. x)~\texttt {Nat}\), we expect that the type is reduced to \(\texttt {Nat}\). That is, \((\lambda x. x)~\texttt {Nat}\) and \(\texttt {Nat}\) as types are considered the same even for code. We effectively take quotient of code over types. This behavior aligns well with quotient inductive-inductive types (QIIT) [6, 31] and we expect QIIT to appear in the semantics in some form, but we leave the detailed investigation as future work.

6.2 Extending Power of Layer 1

Though pattern matching allows inspection of code, not all operations can be defined easily in this way. For example, the weak head normal form reduction (whnf) on a term is not defined by a simple structural recursion on the syntax of the term. In 2-layered modal type theory, we can extend a whnf operation at layer 1 and still maintain normalization. The following are the rewrite rules for whnf and we can extend our previous normalization algorithm in Sec. 5 with a rewrite process [9, 10]:

figure ep

whnf does not go under \(\textsf {succ}\) or \(\lambda \) and is only available at layer 1. Both local and global substitutions simply propagate under whnf. The rewrite process repeats these rules until no rule matches. This process will terminate due to the strong normalization of STLC and therefore the whole system remains terminating. However, with global variables, we must apply extra care to maintain confluence and eventually normalization. In Sec. 5.2, we discuss the impact of global substitutions and the necessity of their stability. When we extend layer 1 with another operation, we must also make sure that this extended operation is stable under global substitutions. When whnf encounters a global variable in the head position, such as \(\textsf {whnf} (\texttt {box}\ u^\delta )\) or \(\textsf {whnf} (\texttt {box}\ (u^\delta ~s))\), there is no matching rule and the rewrite process stops for the same reason for pattern matching stopping for \(\texttt {box}\ u^\delta \). The lack of a reduction rule when a global variable is in the head position is particularly important. With whnf, we can now simplify a term before matching, which is a very useful and typical tactic in proof assistants:

$$\begin{aligned} {} & {} \textsf {match}\ \texttt {box}\ ((\lambda x. x)~\textsf {zero})\ \textsf {with} \qquad \,\,\,\,\,\,\,\, \;|\;\textsf {zero}\Rightarrow \textsf {true} \;|\;\_ \Rightarrow \textsf {false} \approx \textsf {false} \\ {} & {} \textsf {match}\ \textsf {whnf} ~(\texttt {box}\ ((\lambda x. x)~\textsf {zero}))\ \textsf {with} \;|\;\textsf {zero}\Rightarrow \textsf {true} \;|\;\_ \Rightarrow \textsf {false} \approx \textsf {true} \end{aligned}$$

Due to layering, whnf only needs to consider terms in STLC at layer 0. In a homogeneous system, whnf must apply to all possible code, and thus becomes troublesome to define. This extensibility of layer 1 is an important and useful feature for a foundation for meta-programming in proof assistants.

6.3 Extending to More Layers

Another potential of layering is to generalize the 2-layered system to n layers for a fixed \(n > 2\). Scaling to n layers is in fact technically detailed, but conceptually simple. We sketch the process briefly here and leave the details as future work.

In a layered system, terms are type-checked in a context array. For an n-layered system, this context array has length n:

$$ \varGamma _{n-1}; \cdots ; \varGamma _1; \varGamma _0 \vdash _{i} t : T \qquad \text {or} \qquad \overrightarrow{\varGamma }\vdash _{i} t : T \qquad \text {where}~ i \in [0, n-1]. $$

We now use xy to range over variables in all contexts. Each context in the context array contains bindings of a fixed shape. Bindings in \(\varGamma _0\) are x : T. Bindings in \(\varGamma _i\) for \(i \in [1, n-1]\) are of the shape \(x : (\varDelta _{i-1}; \cdots ; \varDelta _0 \vdash T)\). Bindings in each \(\varDelta _j\) also have the specified shape. Contextual types are generalized to context arrays: \(\square (\varDelta _{i-1}; \cdots ; \varDelta _0 \vdash T\). The design of a n-layered system is guided by two principles: the matryoshka principle, which says types and terms at lower layers are subsumed by higher layers, and the static code principle, which only terms at layer \(n-1\) compute. Particularly, the latter principle means that terms from layer 0 to \(n-2\) are static code. Following both principles, we will be able to fill in the details and design an n-layered system.

We expect the n-layered generalization to be compatible with the extension with operations described in Sec. 6.2. Instead of extending layer 1, we extend layer \(n-1\) so that all lower layers are unaffected.

7 Related Work and Conclusion

7.1 Normalization of Modal Type Theories

The core of our paper is the normalization of 2-layered modal type theory. Recently, there have been a number of approaches that explore modal type theories. One of the earliest is from Nanevski et al. [39]. They prove the normalization for contextual modal type theory (CMTT) indirectly by a translation to the system by de Groote [23]. This translation does not give a direct normalization algorithm for CMTT. Our system in Sec. 3 is strictly weaker than CMTT by disallowing nested \(\square \)s. Even if we scale our system to n layers as outlined in Sec. 6.3, the resulting system will only have a hierarchy of contextual types, so we still cannot recover the same expressive power as CMTT to do arbitrary nesting of \(\square \)s. Nevertheless, in Sec. 5, we have shown that this temporary loss in expressive power enables an orthogonal avenue of intensional analysis that is difficult to obtain in CMTT. Kavvos [32] proposes formulations for a few different modalities in the dual-context style and proves the normalization using a translation to de Groote’s system as well. The normalization of System GL, however, is proved directly by reducibility candidates. Lately, Gratzer [20] proves the normalization of multimodal type theory, a generalization of the dual-context systems, using Sterling’s [51] synthetic Tait’s computability.

The Kripke-style systems are another kind of formulation of modalities and is different from ours in context management. The normalization problem for this style is more intensively investigated. Borghuis [13] proves the strong normalization of modal pure type systems in his PhD thesis. More recently, Clouston [16] proves the normalization of System K using reducbility candidates. Gratzer et al.[22] prove the normalization of a dependent type theory with idempotent S4 by parameterizing Abel’s [1] untyped domain method with a poset. Valliappan et al. [55] use the same method and prove the normalization for Systems K, T, K4 and S4. Hu and Pientka [26] establish the same result but introduce a “truncoid” algebraic structure to the untyped domain model instead, so that one normalization proof can be instantiated to adapt to all four systems. This method using truncoids has been scaled to dependent types [25]. It is worth emphasizing that none of these modal type theories supports pattern matching on code as we do in our 2-layer modal type theory.

7.2 Homogeneous Meta-programming and Its Foundations

Early ideas of metaprogramming using quasi-quoting style can be traced back to Lisp/Scheme [3]. In Lisp’s untyped setting, all programs are represented as lists, so intensional analysis is reduced to inspections of lists and is relatively simple. Supporting type-safe metaprogramming leads to all the complications. MetaML [54] is an early example for type-safe meta-programming. MetaML employs a quasi-quoting style similar to Lisp. However, MetaML does not support any form of intensional analysis. In fact, MetaML’s meta-theory even allows reduction of code under quote [52], so intensional analysis is deliberately avoided. The correspondence between meta-programming and modal logic S4 is described by Davies and Pfenning [17]. The correspondence explains how the modal logic S4 models multi-staging and code composition, but it does not explain how intensional analysis should be supported. Two formulations of S4 are presented: the dual-context style and the Kripke style. While the Kripke-style formulation provides a type-theoretic formulation for quasi-quotation, it is more challenging to extend and support intensional analysis. On the other hand, the dual-context style forces programmers to write meta-programs in a comonadic style, but it has better setup for intensional analysis as we have demonstrated. This is also the approach taken in Beluga [43, 45] and Moebius [30].

The semantics for dual-context style has also been studied previously. In the context of contextual types, Gabbay and Nanevski [19] attempt to give a set-theoretic semantics to contextual types. As pointed out by Kavvos [33], their exact formulation of contextual types seems to break the confluence property.

Boespflug and Pientka [12] extend the dual-context style to the multi-context style. Though the multi-context style and the Kripke style both use multiple contexts for typing, the number of contexts in the former is more or less fixed (hence context arrays), while in the latter, contexts are often pushed and popped during typing (hence context stacks). Davies and Pfenning [17] show that the Kripke style system is equivalent to the dual-context style. Moebius [30] combines the multi-context style and contextual types, and supports pattern matching on code in System F. Moebius has subject reduction. However, to adapt Moebius to a type theory, normalization must be proved, but it is not obvious how to support coverage. Whether layering provides a solution requires a future investigation.

\(\varOmega \)mega [56] is another example for a sound meta-programming system with pattern matching on code. \(\varOmega \)mega implements the quasi-quoting style. The open context of a code is annotated in the type, similar to contextual types. However, the type of the code itself is not remembered, so their type system is not as complex due to reduced type information.

7.3 Intensionality in Type Theories

Interest in intensionality is often associated with modalities. Pfenning [41] describes a type theory in which terms might be treated intensionally, extensionally, or irrelevantly when corresponding modalities are employed. This is similar to our setting, where intensionality of code is marked by the \(\square \) modality. In the same setting, Kavvos [33] describes a special kind of intensional recursions using Löb induction (\((\square A \rightarrow A) \rightarrow A\)), which supports meta-programs to access their own code. Löb induction says if we can prove A from the proof of A, then A is true. Löb induction is incompatible with the Axiom T (\(\square A \rightarrow A\)), but it still has interesting computational behaviors, including an example for computer viruses. Chen and Ko [14] resolve the incompatibility between the Löb induction and the Axiom T by supporting them in two separate modalities.

7.4 Layering in Type Theories

Layering can also be found in other type theories. Logical Framework (LF) [24] is essentially a layered system. LF is a dependently typed framework to define object languages. These object languages live at one layer. LF as their meta-language live at a higher layer. Isabelle [40] is one example for modern proof assistants based on LF. There are two layers in Cocon [47]. At the lower layer is LF, which defines an object language. At the higher layer is a Martin-Löf type theory (MLTT) for computation. Two layers are connected by contextual types. Cocon supports induction in MLTT on the syntax of an object language in LF. Cocon’s structure leads to a similar semantics to our 2-layered modal type theory. The main difference is that in 2-layered modal type theory, the core language at layer 0 is a sub-language of the computational language at layer 1. Consequently, all terms at layer 0 can be lifted to layer 1 (Lemma 3) for free and be run as programs. The conversion from code to programs is done implicitly in the semantics. Contrarily, in Cocon, since the object language is defined freely, an embedding to MLTT must be given explicitly and is only possible if the object language has strictly weaker expressive power. A categorical semantics for Cocon is given by Hu et al.[29, 46]. Kovács [36] and Allais [4] demonstrate applications of 2-level type theory which focuses more on code composition and does not support intensional analysis.

Our system uses layers to account for the number of nested \(\square \)’s, which shares some similarity with graded and quantitative systems [2, 8, 38]. The latter systems use grades to keep track of uses of variables. We believe that it would be interesting to have a universal framework to contain all these different uses of modalities, though it requires further investigations.

Our approach is also similar to GuTT [21], a guarded type theory supporting Löb induction. GuTT has two layers. The first layer excludes dynamics of Löb induction (but not for other terms) and enjoys normalization. The lost dynamics is recovered at the second layer, at the cost of normalization. We are similar in that we both take advantage of differences between layers and one layer is the extension of the other.

7.5 Conclusion and Future Work

In this paper, we introduce the layered style to support intensional analysis in type theory. In the layered view, meta-programming is done in an extended language of a chosen core language. Pattern matching on code at a higher layer only needs to handle code at lower layers, hence circumventing the complications in previous work. We investigate the layered style in 2-layered modal type theory, which supports pattern matching on code where we guarantee coverage by construction. We provide a constructive proof of normalization by evaluation using a presheaf model. The normalization algorithm extracted from the model is proven complete and sound and is implemented in Agda.

Layering provides a controlled and modular way to introduce meta-program-ming with intensional analysis to type theory. As a first step, we plan to add context abstraction following the approach taken for example in Beluga to support more general recursion principles [43, 44]. We see abstracting over contexts as an orthogonal issue. As a next step, we will adapt layering to Martin-Löf type theory (MLTT). Both extensions will create the first dependent type theory that is supporting intensional analysis of code within MLTT. In the long term, we hope that this type theory will also provide a foundation for extending the core language of Coq, Agda, or other proof assistants with a meta-language of tactics that can reuse all definitions from the core language while the normalization of the overall system is retained.