Stepping Stones in the TPTP World

Sutcliffe, Geoff

doi:10.1007/978-3-031-63498-7_3

Geoff Sutcliffe ORCID: orcid.org/0000-0001-9120-3927²⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14739))

Included in the following conference series:

International Joint Conference on Automated Reasoning

275 Accesses

Abstract

The TPTP World is a well established infrastructure that supports research, development, and deployment of Automated Theorem Proving (ATP) systems. There are key components that help make the TPTP World a success: the TPTP problem library was first released in 1993, the CADE ATP System Competition (CASC) was conceived after CADE-12 in 1994, problem difficulty ratings were added in 1997, the current TPTP language was adopted in 2003, the SZS ontologies were specified in 2004, the TSTP solution library was built starting around 2005, the Specialist Problem Classes (SPCs) have been used to classify problems since 2010, the SystemOnTPTP service has been offered from 2011, the StarExec service was started in 2013, and a world of TPTP users have helped all along. This paper reviews these stepping stones in the development of the TPTP World.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

The TPTP World [38, 41] (once gently referred to as the “TPTP Jungle” [3]) is a well established infrastructure that supports research, development, and deployment of Automated Theorem Proving (ATP) systems. Salient components of the TPTP World are the TPTP languages [47], the TPTP problem library [37], the TSTP solution library [38], the SZS ontologies [36], the Specialist Problem Classes (SPCs) and problem difficulty ratings [51], SystemOnTPTP [33] and StarExec [32], and the CADE ATP System Competition (CASC) [40]. There are dependencies between these parts of the TPTP World, as shown in Fig. 1, forming a series of “stepping stones” from TPTP standards to happy users, described in the following sections of this paper ...

The TPTP language (Sect. 2) is used for writing problems in the TPTP problem library and the TSTP solution library.
The TPTP problem library (Sect. 3) is the central collection of test problems.
The TSTP solution library (Sect. 4) is the central collection of solutions to the TPTP library problems.
The SZS ontologies (Sect. 5) are used to specify properties of problems and solutions.
The SPCs (Sect. 6) that classify problems are based on the language form and SZS ontology values.
The TPTP problems’ difficulty ratings (Sect. 7) are computed wrt each SPC, using data from the TSTP solution library.
SystemOnTPTP (Sect. 8) provides online access to ATP systems and tools.
The StarExec computers (Sect. 8) are used to build the TSTP, and to run CASC.
The CADE ATP System Competition (Sect. 9) is the annual evaluation of ATP systems - the world championship for such systems.
Users of the TPTP World (Sect. 10) provide problems for the TPTP problem library, and ATP systems/tools for SystemOnTPTP and StarExec.

There is a cycle of dependencies from the TPTP problem library, to the TSTP solution library, to the problem difficulty ratings, and back to the TPTP problem library. This cycle means that building these components, particularly building releases of the TPTP problem library, requires iteration until stability is reached.

Various parts of the TPTP World have been deployed in a range of applications, in both academia and industry. Since the first release of the TPTP problem library in 1993, many researchers have used the TPTP World as an appropriate and convenient basis for ATP system research and development. The web page www.tptp.org provides access to all components.

2 The TPTP Language

The TPTP language [42] is one of the keys to the success of the TPTP World. The TPTP language is used for writing both problems and solutions, which enables convenient communication between ATP systems and tools. Originally the TPTP World supported only first-order clause normal form (CNF) [50]. Over the years full first-order form (FOF) [37], typed first-order form (TFF) [2, 46], typed extended first-order form (TXF) [44], typed higher-order form (THF) [12, 43], and non-classical forms (NTF) [30] have been added. The standardisation received a significant technical boost in 2006 when the BNF definition was revised so that all the language forms are quite precisely specified.^{Footnote 1} As a result a parser can be generated using lex/yacc [54]. A general principle of the TPTP language is: “we provide the syntax, you provide the semantics”. As such, there is no a priori commitment to any semantics for each of the language forms, although in almost all cases the intended logic and semantics are well known.

Problems and solutions are built from annotated formulae of the form ...

language(name, role, formula, source, useful_info)

The languages supported are cnf (clause normal form), fof (first-order form), tff (typed first-order form), and thf (typed higher-order form). The role, e.g., axiom, lemma, conjecture, defines the use of the formula. In a formula, terms and atoms follow Prolog conventions – functions and predicates start with a lowercase letter or are ’single quoted’, and variables start with an uppercase letter. The language also supports interpreted symbols that either start with a $, e.g., the truth constants $true and $false, or are composed of non-alphabetic characters, e.g., integer/rational/real numbers such as 27, 43/92, -99.66. The logical connectives in the TPTP language are , for the mathematical connectives $\forall $, $\exists $, $\lnot $, $\vee $, $\wedge $, $\Rightarrow $, $\Leftarrow $, $\Leftrightarrow $, and $\oplus $ respectively. Equality and inequality are expressed as the infix operators = and !=. The source and useful_info are optional. Figure 2 shows an example with typed higher-order annotated formulae.

3 The TPTP Problem Library

The first inklings of the TPTP World emerged at CADE-10 in Kaiserslautern, Germany, in 1990, as a collaboration between Geoff Sutcliffe from the University of Western Australia (James Cook University from 1993), and Christian Suttner from the Technische Universität München. At that time a large number of test problems had accumulated in the ATP community, in both hardcopy [4, 14, 17, 22, 25, 56] and electronic form [13, 29].^{Footnote 2} We observed that ATP system developers were testing their systems, and publishing results, based on small numbers of selected test problems. At the same time some good ideas were seen to be abandoned because they had been tested on inappropriate selections of test problems. These observations motivated us to start collecting ATP test problems into what became the TPTP problem library. A comprehensive library of test problems is necessary for meaningful system evaluations, meaningful system comparisons, repeatability of testing, and the production of statistically significant results. The goal was to support testing and evaluation of ATP systems, to help ensure that performance results accurately reflect the capabilities of the ATP systems being considered.

Releases of the TPTP problem library are identified by numbers in the form vVersion.Edition.Patch. The Version enumerates major new releases of the TPTP in which important new features have been added. The Edition is incremented each time new problems are added to the current version. The Patch level is incremented each time errors found in the current edition are corrected. The first release of the TPTP problem library, v1.0.0 was made on Friday 12th November 1993.

The problems in the library are classified into domains that reflect a natural hierarchy, based mainly on the Dewey Decimal Classification and the Mathematics Subject Classification. Seven main fields are defined: logic, mathematics, computer science, science & engineering, social sciences, arts & humanities, and other. Each field is subdivided into domains, each identified by a three-letter mnemonic, e.g., the social science field has three domains: Social Choice Theory (SCT), Management (MGT), and Geography (GEG).

Table 1 lists the versions of the TPTP up to v9.0.0, with the new feature added, the number of problem domains, and the number of problems.^{Footnote 3} The number of domains indicates the diversity of the problems, while the number of problems indicates the size of the library. The attentive reader might note that many releases have been made in July-September. This is because the CADE ATP System Competition (CASC - see Sect. 9), has an influence on the release cycle of the TPTP.

Table 1. Overview of TPTP releases

Full size table

Each TPTP problem file has three parts: a header, optional includes, and annotated formulae. The header section contains information for users, formatted as comments in four parts, as shown in Fig. 3. The first part identifies and describes the problem, the second part provides information about occurrences of the problem in the literature and elsewhere, the third part provides semantic and syntactic characteristics of the problem, and the last part contains comments and bugfix information. The header fields are self-explanatory, but of particular interest are the Status field that is explained in Sect. 5, the Rating field that is explained in Sect. 7, and the SPC field that is explained in Sect. 6. The include section is optional, and if used contains include directives for axiom files, which in turn have the same three-part format as problem files; see Fig. 3 for an example. Inclusion avoids the need for duplication of the formulae in commonly used axiomatizations. The annotated formulae are described in Sect. 2, and Fig. 2 provides an example.

4 The TSTP Solution Library

The complement to the TPTP problem library is the TSTP solution library [35, 38]. The TSTP is built by running all the ATP systems that are available in the TPTP World on all the problems in the TPTP problem library. The TSTP started being built around 2005, using solutions provided by individual system developers. From 2010 to 2013 the TSTP was generated on a small NSF funded^{Footnote 4} cluster at the University of Miami. Since 2014 the ATP systems have been run on StarExec (see Sect. 8), initially on the StarExec Iowa cluster, and since 2018 on the StarExec Miami cluster. StarExec has provided stable platforms that produce reliably consistent and comparable data in the TSTP. At the time of writing this paper the TSTP contained the results of running 87 ATP systems and system variants on all the problems that each system could attempt (therefore, e.g., systems that do model finding for FOF are not run on THF problems). This produced 1091026 runs, of which 432718 (39.6%) solved the problem. One use of the TSTP is for computing the TPTP problem difficulty ratings (see Sect. 7).

TSTP solution files have a header section and annotated formulae. The header section has four parts, as shown in Fig. 4: the first part identifies the ATP system, the problem, and the system’s runtime parameters; the second part provides information about the hardware, operating system, and resource limits; the third part provides the SZS result and output values (see Sect. 5), and syntactic characteristics of the solution; the last part contains comments. The solution follows in annotated formulae.

For derivations, where formulae are derived from parent formulae, e.g., in proofs, refutations, etc., the source fields of the annotated formulae are used to capture parent-derived formulae relationships in the derivation DAG. This includes the source of the formula – either the problem file or an inference. Inference data includes the name of the inference rule used, the semantic relationship between the parents and the derived formula as an SZS ontology value (see Sect. 5), and a list of the parent annotated formulae names. Figure 5 shows an example refutation [slightly modified, type declarations omitted] from the E system [27] for the problem formulae in Fig. 2, and Fig. 4 shows the corresponding header fields.

For interpretations (typically models) the annotated formulae are used to describe the domains and symbol mappings of Tarskian interpretations, or the formulae that induce Herbrand models. A TPTP format for interpretations with finite domains was previously defined [47], and has served the ATP community adequately for almost 20 years. Recently the need to represent interpretations with infinite domains, and Kripke interpretations, has led to the development of a new TPTP format that supports these interpretations [49]. Figure 6 shows the problem formulae and a model that uses integers as the domain. Please read [49] for lots more details!

Resource Limits: A common question, often based on mistaken beliefs, is whether the resource limits used should be increased to find more solutions. Analysis shows that increasing resource limits does not significantly affect which problems are solved by an ATP system. Figure 7 illustrates this point; it plots the CPU times taken by several contemporary ATP systems to solve the TPTP problems for the FOF_THM_RFO_* SPCs, in increasing order of time taken. The data was taken from the TSTP, i.e., using the StarExec Miami computers. The relevant feature of these plots is that each system has a point at which the time taken to find solutions starts to increase dramatically. This point is called the system’s Peter Principle [23] Point (PPP) – it is the point at which the system has reached its level of incompetence. Evidently a linear increase in the computational resources beyond the PPP would not lead to the solution of significantly more problems. The PPP thus defines a realistic computational resource limit for the system, and if enough CPU time is allowed for an ATP system to reach its PPP, a usefully accurate measure of what problems it can solve is achieved. The performance data in the TSTP is produced with adequate resource limits.

5 The SZS Ontologies

The SZS ontologies [36] (named “SZS” after the authors of the first presentation of the ontologies [52]) provide values to specify the logical status of problems and solutions, and to describe logical data. The Success ontology provides values for the logical status of a conjecture with respect to a set of axioms, e.g., a TPTP problem whose conjecture is a logical consequence of the axioms is tagged as a Theorem (as in Fig. 3), and a model finder that establishes that a set of axioms (with no conjecture) is consistent should report Satisfiable. The Success ontology is also used to specify the semantic relationship between the parent “axioms” and inferred “conjecture” of an inference, as done in the TPTP format for derivations (see Sect. 4). The NoSuccess ontology provides reasons why an ATP system/tool has failed, e.g., an ATP system might report Timeout. The Dataform ontology provides values for describing logical data, as might be output from an ATP system/tool, e.g., a model finder might output a FiniteModel. Figure 8 shows some of the salient nodes of the ontologies. Their expanded names and their (abbreviated) definitions are listed in Fig. 9.

The SZS standard also specifies the precise way in which the ontology values should be presented in ATP system output, in order to facilitate easy processing. For example, if an ATP system has established that a conjecture is not a theorem of the axioms, by finding a finite countermodel of the axioms and negated conjecture, the SZS format output would be (see Sect. 4 for the format of the annotated formulae):

6 Specialist Problem Classes

The TPTP problem library is divided into Specialist Problem Classes (SPCs) – classes of problems with the same recognizable logical, language, and syntactic characteristics that make the problems in each SPC homogeneous wrt ATP systems. Evaluation of ATP systems within SPCs makes it possible to say which systems work well for what types of problems. The appropriate level of subdivision for SPCs is that at which less subdivision would merge SPCs for which ATP systems have distinguishable behaviour, and at which further subdivision would unnecessarily split SPCs for which ATP systems have reasonably homogeneous behaviour. Empirically, homogeneity is ensured by examining the patterns of system performance across the problems in each SPC. The SPCs for essentially propositional problems were motivated by the observation that SPASS [55] performed differently on the ALC problems in the SYN domain of the TPTP. A data-driven test for homogeneity is also possible [6].

The characteristics currently used to define the SPCs in the TPTP are shown in Fig. 10. Using these characteristics 223 SPCs are defined in TPTP v8.2.0. For example, the SPC TF0_THM_NEQ_ARI contains typed monomorphic first-order theorems that have no equality but include arithmetic. The header section of each problem in the TPTP problem library (see Sect. 3) includes its SPC. The SPCs are used when computing the TPTP problems difficulty ratings (see Sect. 7).

7 Problem Difficulty Ratings

Each TPTP problem has a difficulty rating that provides a well-defined measure of how difficult the problem is for current ATP systems [51]. The ratings are based on performance data in the TSTP solution library (see Sect. 4), and are updated in each TPTP edition.

The TPTP tags problems that are designed specifically to be suited or ill-suited to some ATP system, calculus, or control strategy as biased (this includes the SYN000 problems, which are designed for testing ATP systems’ parsers). The biased problems are excluded from the calculations. Rating is then done separately for each SPC (see Sect. 6).

First, a partial order between systems is determined according to whether or not a system solves a strict superset of the problems solved by another system. If a strict superset is solved, the first system is said to subsume the second. The union of the problems solved by the non-subsumed systems defines the state-of-the-art – all the problems that are solved by any system. The fraction of non-subsumed systems that fail on a problem is the difficulty rating for the problem: problems that are solved by all of the non-subsumed systems get a rating of 0.00 (easy); problems that are solved by some of the non-subsumed systems get a rating between 0.00 and 1.00 (difficult); problems that are solved by none of the non-subsumed systems get a rating of 1.00 (unsolved). As additional output, the systems are given a rating for each SPC – the fraction of difficult problems they can solve.

The TPTP difficulty ratings provide a way to assess progress in the field – as problems that are unchanged are not actually getting easier, decreases in their difficulty ratings are evidence of progress in ATP systems. Figures 11 and 12 plot the average difficulty ratings overall and for each of the four language forms in the TPTP World (after some sensible data cleaning). Figure 11 is taken from [41], published in 2017. It plots the average ratings for the 14527 problems that had been unchanged and whose ratings had not been stuck at 0.00 or 1.00, from TPTP v5.0.0 that was released in 2010 to v6.4.0 that was released in 2016. Figure 12 is taken from [45], published in 2024. It plots the average ratings for the 16236 problems that had been unchanged and whose ratings had not been stuck at 0.00 or 1.00, from TPTP v6.3.0 that was released in 2016 to v8.2.0 that was released in 2023. The two figures’ plots dovetail quite well, which gives confidence that they really are comparable (there are some minor differences caused by slightly different data cleaning and by recent refinements to the difficulty rating calculations). The older plots show a quite clear downward trend both overall and for the four language forms, while the new plots do not. Possible reasons are discussed in [45].

8 SystemOnTPTP and StarExec

Some of the early users of the TPTP problem library (see Sect. 3) were working in disciplines other than computer science, e.g. (with a few exemplar references), mathematics [16, 26], logic [8, 11], management [20, 21], planning [28]. These users often selected the Otter system [15] for their experiments, as it was readily available and easy enough to install. As the TPTP World evolved it was clear that more powerful ATP systems were available, especially evident in the CADE ATP System Competition (see Sect. 9). However, these more powerful systems were often not as easy to obtain and install. In order to make the use of ATP easier for non-geek users, an NSF grant was obtained^{Footnote 5} to build the SystemOnTPTP online service, which provides hardware and a web interface for users to submit their problems to most recent versions of a wide range of ATP systems. SystemOnTPTP can provide recommendations for ATP systems that might be most likely to solve a problem, based on the SPC of the user’s problem and the system ratings for the SPC (see Sects. 6 and 7). SystemOnTPTP can also run ATP systems (e.g., the recommended systems) in competition parallel [48]. The core SystemOnTPTP service is supported by (i) the SystemB4TPTP service that provides tools to prepare problems for submission to ATP systems, e.g., axiom selection [9], type checking [12], a scripting language [39], and (ii) the SystemOnTSTP service that provides tools for analysing solutions output from SystemOnTPTP, e.g., interactive viewing of derivations [53], proof checking [34], identification of interesting lemmas in a proof [24]. While many users enjoy interactive use of the services in a web browser, it is also easy to use the services programmatically – in fact this is where most of the use comes from. In 2023 there were 887287 requests serviced (an average of one every 36 s), from 286 countries. One heavy programmatic user is the Sledgehammer component of the interactive theorem prover Isabelle [19].

The TPTP problem library was motivated (see Sect. 3) by the need to provide support for meaningful ATP system evaluation. This need was also (or became) evident in other logic solver communities, e.g., SAT [10] and SMT [5]. For many years testing of logic solvers was done on individual developer’s computers. In 2010 a proposal for centralised hardware and software support was developed, and in 2011 a $2.11 million NSF grant^{Footnote 6} was obtained. This grant led to the development and availability of StarExec Iowa [32] in 2012, and a subsequent $1.00 million grant^{Footnote 7} in 2017 expanded StarExec to Miami. StarExec has been central to much progress in logic solvers over the last 10 years, supporting 16 logic solver communities, used for running many annual competitions [1], and supporting many many users.

It was recently announced that StarExec Iowa will be decommissioned. The maintainer of StarExec Iowa explained that “the plan is to operate StarExec as usual for competitions Summer 2024 and Summer 2025, and then put the system into a read-only mode for one year (Summer 2025 to Summer 2026)”. While StarExec Miami will continue to operate while funding is available, it will not be able to support the large number of logic solver communities that use the larger StarExec Iowa cluster. In the long run it will be necessary for StarExec users to transition to new environments, and several plans are (at the time of writing) being discussed. One effort, funded by an Amazon Research Award^{Footnote 8}, is to containerise StarExec and ATP systems so they will run in a Kubernetes framework on Amazon Web Services [7]. This will allow communities and individual users to run their own StarExec.

9 The CADE ATP System Competition

The CADE ATP System Competition (CASC) [40] is the annual evaluation of fully automatic, classical logic, ATP systems - the world championship for such systems. CASC is held at each CADE (the International Conference on Automated Deduction) and IJCAR (the International Joint Conference on Automated Reasoning) conference – the major forums for the presentation of new research in all aspects of automated deduction. One purpose of CASC is to provide a public evaluation of the relative capabilities of ATP systems. Additionally, CASC aims to stimulate ATP research, motivate development and implementation of robust ATP systems that can be easily and usefully deployed in applications, provide an inspiring environment for personal interaction between ATP researchers, and expose ATP systems within and beyond the ATP community. Over the years CASC has been a catalyst for impressive improvements in ATP, stimulating both theoretical and implementation advances [18]. It has provided a forum at which empirically successful implementation efforts are acknowledged and applauded, and at the same time provides a focused meeting at which novice and experienced developers exchange ideas and techniques. The first CASC was held at CADE-13 in 1996, at DIMACS in Rutgers University, USA. The CASC web site provides access to all the details: www.tptp.org/CASC. Of particular interest for this IJCAR is that CASC was conceived 30 years ago in 1994 after CADE-12 in Nancy, when Christian Suttner and I were sitting on a bench under a tree in Parc de la Pépinière, burning time before our train departures.

CASC is run in divisions according to problem and system characteristics, in a coarse version of the SPCs (see Sect. 6). Problems for CASC are taken from the TPTP problem library (see Sect. 3), and some other sources. The TPTP problem ratings (see Sect. 7) are used to select appropriately difficult problems from the TPTP problem library. The systems are ranked according to the number of problems solved with an acceptable proof/model output. Ties are broken according to the average time over problems solved.

The design and organization of CASC has evolved over the years to a sophisticated state. In the years from CASC-26 in 2017 to CASC-29 in 2023 the competition stayed quite stable, but each year the various divisions, evaluations, etc., were optimized (as was also the case in prior years when there were also larger changes in the competition design). The changes in the divisions reflect the changing demands and interest in different types of ATP problems, and decisions made for CASC (in the context of the TPTP World) have had an influence on the directions of development in ATP. Over the years 11 divisions have been run ...

The Clause Normal Form (CNF) division from CASC-13 in 1996 to CASC-23 in 2011.
The Satisfiability (SAT) division from CASC-14 in 1997 to CASC-22 in 2009.
The First-order Form (FOF) division from CASC-15 in 1998, ongoing.
The Effectively Propositional (EPR) division from CASC-JC in 2001 to CASC-27 in 2019.
The First-order Non-theorem (FNT) division from CASC-21 in 2007 to CASC-29 in 2023.
The Large Theory Batch (LTB) division from CASC-J4 in 2008 to CASC-J11 in 2022.
The Typed Higher-order (THF) division from CASC-J5 in 2010, ongoing.
The Typed First-order with Arithmetic (TFA) division from CASC-23 in 2011, ongoing.
The Typed First-order Non-theorem (TFN) division from CASC-25 in 2015 to CASC-J8 in 2016, revived in CASC-29 in 2003, ongoing.
The Sledgehammer (SLH) division from CASC-26 in 2017, ongoing.
The I Challenge You (ICU) division introduced for CASC-J12 in 2024.

In the 20 CASCs so far 111 distinct ATP systems have been entered. For each CASC the division winners of the previous CASC are automatically entered to provide benchmarks against which progress can be judged. Some systems have emerged as dominant in some of the divisions, with Vampire being a well-known example. The strengths of these systems stem from four main areas: solid theoretical foundations, significant implementation efforts (in terms of coding and data structures), extensive testing and tuning, and an understanding of how to optimize for CASC.

10 TPTP World Users

Over the years the TPTP World has provided a platform upon which ATP users have presented their needs to ATP system developers, who have then adapted their ATP systems to the users’ needs. The interplay between the TPTP problem library (see Sect. 3) and the CADE ATP System Competition (see Sect. 9) has been particularly effective as a conduit for ATP users to provide samples of their problems to ATP system developers. Users’ problems that are contributed to the TPTP are eligible for use in CASC. The problems are then exposed to ATP system developers, who improve their systems’ performances on the problems, in order to perform well in CASC. This completes a cycle that provides the users with more effective tools for solving their problems.

Many people have contributed to the TPTP World, with problems, software, advice, expertise, and enthusiasm. I am grateful to them all^{Footnote 9} and here are just a few who have made salient contributions (ordered roughly by date of the contributions mentioned):

Christian Suttner - The TPTP problem library and CASC.
Jeff Pelletier - Early support and thoughtful advice, problem contributions.
Stephan Schulz - Technical expertise.
Andrei Voronkov & Vampire team - Enthusiasm and constructive feedback.
Allen van Gelder - The TPTP language BNF and parser.
Adam Pease and Josef Urban - Large theory problems.
Christoph Benzüller and Chad Brown - The TH0 language.
Koen Claessen, Peter Baumgartner, and Stephan Schulz - The TF0 language.
Jasmin Blanchette - Linking Sledgehammer to SystemOnTPTP, the TF1 language.
Andrei Paskevich - The TF1 language.
Cezary Kaliszyk and Florian Rabe - The TH1 language.
Evgeny Kotelnikov - The TXF language.
Christoph Benzüller and Alexander Steen - The NTF language.

Thank you!

11 Conclusion

This paper has described key components of the TPTP World that help make it a success, linking them together as “stepping stones” that lead from one component to another. The large number of citations to work of others, and explicitly Sect. 10, illustrate how the TPTP World has benefited, and benefited from, users in the ATP community. I am also grateful to the many people who have donated hard cash to the project, helping to keep it alive!

This paper has naturally focused on the successful parts of the TPTP World. There have also been some failed developments and suboptimal (in retrospect) decisions . For example, in 2015 there was an attempt to develop a description logic form for the TPTP language. While some initial progress was made, it ground to a halt without support from the description logic community. A suboptimal design decision, rooted in the early days of the TPTP, is the naming scheme used for problem files. The naming scheme uses three digits to number the problems in each domain, thus setting a limit of 1000 problems, which failed to anticipate the numbers of problems that would be contributed to some of the problem domains. This has been overcome by creating multiple domain directories where necessary, but if it were to be done again, six or eight digit problem numbers shared across all domains would be an improvement.

The maintenance and development of the TPTP World is ongoing work. The most recent development is the languages and support for non-classical logics, initially modal logic [30, 31]. The new format for representing interpretations (see Sect. 4) will be promulgated in the near future. As always, the ongoing success and utility of the TPTP problem library depends on ongoing contributions of problems – the automated reasoning community is encouraged to continue making contributions of all types of problems.

The TPTP World would not exist without the early strategic insights of Christian Suttner, with his willingness to let me do the organization without interference. Maybe his most wonderful contribution (which took him over two hours to produce when he was visiting me at James Cook University – I think he took a nap ) is his wonderfully simple plain-language definition of automated theorem proving: “the derivation of conclusions that follow inevitably from known facts”.

Notes

1.
But note that the BNF “syntax” is not completely strict – the BNF uses an extended BNF form that relegates details to “strict” rules that do not have to be checked by a parser.
2.
To my knowledge, the first circulation of test problems was by Larry Wos in the late sixties.
3.
The data for v9.0.0 is an estimate, because this paper was written before the release was finalised.
4.
NSF Award 0957438.
5.
NSF Award 1405674.
6.
NSF Awards 1058748 and 1058925, led by Aaron Stump and Cesare Tinelli at the University of Iowa.
7.
NSF Award 1730419.
8.
Amazon Research Award “Automated Theorem Proving Community Infrastructure in the AWS Cloud”, www.amazon.science/research-awards/recipients/geoffrey-sutcliffe.
9.
See www.tptp.org/TPTP/TR/TPTPTR.shtml#Conclusion

References

Bartocci, E., et al.: TOOLympics 2019: an overview of competitions in formal methods. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) TACAS 2019. LNCS, vol. 11429, pp. 3–24. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17502-3_1
Google Scholar
Blanchette, J.C., Paskevich, A.: TFF1: the TPTP typed first-order form with rank-1 polymorphism. In: Bonacina, M.P. (ed.) CADE 2013. LNCS (LNAI), vol. 7898, pp. 414–420. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_29
Google Scholar
Blanchette, J., Urban, J. (eds.): Proceedings of the 3rd International International Workshop on Proof Exchange for Theorem Proving. No. 14 in EPiC Series in Computing, EasyChair Publications (2013)
Google Scholar
Boyer, R., Lusk, E., McCune, W., Overbeek, R., Stickel, M., Wos, L.: Set theory in first-order logic: clauses for Godel’s axioms. J. Autom. Reason. 2(3), 287–327 (1986)
Google Scholar
Cok, D., Stump, A., Weber, T.: The 2013 evaluation of SMT-COMP and SMT-LIB. J. Autom. Reason. 55(1), 61–90 (2015)
Google Scholar
Fuchs, M., Sutcliffe, G.: Homogeneous sets of ATP Problems. In: Haller, S., Simmons, G. (eds.) Proceedings of the 15th International FLAIRS Conference, pp. 57–61. AAAI Press (2002)
Google Scholar
Fuenmayor, D., McKeown, J., Sutcliffe, G.: Towards StarExec in the cloud. In: Rawson, M., Schulz, S., Korovin, K. (eds.) Proceedings of the 15th International Workshop on the Implementation of Logics. p. To appear (2024)
Google Scholar
Glickfield, B., Overbeek, R.: A foray into combinatory logic. J. Autom. Reason. 2(4), 419–431 (1986)
Google Scholar
Hoder, K., Voronkov, A.: Sine Qua non for large theory reasoning. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS (LNAI), vol. 6803, pp. 299–314. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22438-6_23
Google Scholar
Hoos, H., Stützle, T.: SATLIB: an online resource for research on SAT. In: Gent, I., van Maaren, H., Walsh, T. (eds.) Proceedings of the 3rd Workshop on the Satisfiability Problem, pp. 283–292. IOS Press (2000)
Google Scholar
Jech, T.: Otter experiments in a system of combinatory logic. J. Autom. Reason. 14(3), 413–426 (1995)
Google Scholar
Kaliszyk, C., Sutcliffe, G., Rabe, F.: TH1: the TPTP typed higher-order form with Rank-1 polymorphism. In: Fontaine, P., Schulz, S., Urban, J. (eds.) Proceedings of the 5th Workshop on Practical Aspects of Automated Reasoning, pp. 41–55. No. 1635 in CEUR Workshop Proceedings (2016)
Google Scholar
Laboratory, A.N.: The Argonne National Laboratory Problem Collection. http://info.mcs.anl.gov/
McCharen, J., Overbeek, R., Wos, L.: Problems and experiments for and with automated theorem-proving programs. IEEE Trans. Comput. C-25(8), 773–782 (1976)
Google Scholar
McCune, W.: Otter 3.3 Reference Manual. Technical report. ANL/MSC-TM-263, Argonne National Laboratory, Argonne, USA (2003)
Google Scholar
McCune, W., Padmanabhan, R.: Automated Deduction in Equational Logic and Cubic Curves. LNAI, vol. 1095. Springer-Verlag, Heidelberg (1996). https://doi.org/10.1007/3-540-61398-6
Google Scholar
McCune, W., Wos, L.: Experiments in automated deduction with condensed detachment. In: Kapur, D. (ed.) CADE 1992. LNCS, vol. 607, pp. 209–223. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55602-8_167
Google Scholar
Nieuwenhuis, R.: The impact of CASC in the development of automated deduction systems. AI Commun. 15(2–3), 77–78 (2002)
Google Scholar
Paulson, L., Blanchette, J.: Three years of experience with sledgehammer, a practical link between automatic and interactive theorem provers. In: Sutcliffe, G., Ternovska, E., Schulz, S. (eds.) Proceedings of the 8th International Workshop on the Implementation of Logics, pp. 1–11. No. 2 in EPiC Series in Computing, EasyChair Publications (2010)
Google Scholar
Peli, G., Bruggeman, J., Masuch, M., O Nuallain, B.: A Logical Approach to Formalizing Organizational Ecology: Formalizing the Inertia-Fragment in First-Order Logic. Technical report. CCSOM Preprint 92-74, Department of Statistics and Methodology, University of Amsterdam (1992)
Google Scholar
Peli, G., Masuch, M.: The logic of propogation strategies: axiomatizing a fragment of organization ecology in first-order logic. In: Moore, D. (ed.) Academy Of Management: Best Papers Proceedings 1994, pp. 218–222 (1994)
Google Scholar
Pelletier, F.: Seventy-five problems for testing automatic theorem provers. J. Autom. Reason. 2(2), 191–216 (1986)
Google Scholar
Peter, L., Hull, R.: The Peter Principle. Souvenir Press, Chicago (1969)
Google Scholar
Puzis, Y., Gao, Y., Sutcliffe, G.: Automated generation of interesting theorems. In: Sutcliffe, G., Goebel, R. (eds.) Proceedings of the 19th International FLAIRS Conference, pp. 49–54. AAAI Press (2006)
Google Scholar
Quaife, A.: Automated deduction in von Neumann-Bernays-Godel set theory. J. Autom. Reason. 8(1), 91–147 (1992)
Google Scholar
Quaife, A.: Automated Development of Fundamental Mathematical Theories. Kluwer Academic Publishers (1992)
Google Scholar
Schulz, S., Cruanes, S., Vukmirović, P.: Faster, higher, stronger: E 2.3. In: Fontaine, P. (ed.) CADE 2019. LNCS (LNAI), vol. 11716, pp. 495–507. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29436-6_29
Google Scholar
Segre, A., Elkan, C.: A high-performance explanation-based learning algorithm. Artif. Intell. 69(1–2), 1–50 (1994)
Google Scholar
SPRFN: The Problem Collection Distributed with the SPRFN ATP System. https://www.cs.unc.edu/Research/mi/mi-provers.html
Steen, A., Fuenmayor, D., Gleißner, T., Sutcliffe, G., Benzmüller, C.: Automated reasoning in non-classical logics in the TPTP world. In: Konev, B., Schon, C., Steen, A. (eds.) Proceedings of the 8th Workshop on Practical Aspects of Automated Reasoning. p. Online. No. 3201 in CEUR Workshop Proceedings (2022)
Google Scholar
Steen, A., Sutcliffe, G.: TPTP world infrastructure for non-classical logics. In: Nalon, C., Steen, A., Suda, M. (eds.) Proceedings of the 9th Workshop on Practical Aspects of Automated Reasoning, p. Online. CEUR Workshop Proceedings (2024)
Google Scholar
Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: a cross-community infrastructure for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) Proceedings of the 7th International Joint Conference on Automated Reasoning. pp. 367–373. No. 8562 in Lecture Notes in Artificial Intelligence (2014)
Google Scholar
Sutcliffe, G.: SystemOnTPTP. In: McAllester, D. (ed.) Proceedings of the 17th International Conference on Automated Deduction, pp. 406–410. No. 1831 in Lecture Notes in Artificial Intelligence, Springer-Verlag (2000)
Google Scholar
Sutcliffe, G.: Semantic Derivation Verification: techniques and Implementation. Int. J. Artif. Intell. Tools 15(6), 1053–1070 (2006)
Google Scholar
Sutcliffe, G.: TPTP, TSTP, CASC, etc. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR 2007. LNCS, vol. 4649, pp. 6–22. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74510-5_4
Google Scholar
Sutcliffe, G.: The SZS ontologies for automated reasoning software. In: Sutcliffe, G., Rudnicki, P., Schmidt, R., Konev, B., Schulz, S. (eds.) Proceedings of the LPAR Workshops: Knowledge Exchange: Automated Provers and Proof Assistants, and the 7th International Workshop on the Implementation of Logics, pp. 38–49. No. 418 in CEUR Workshop Proceedings (2008)
Google Scholar
Sutcliffe, G.: The TPTP problem library and associated infrastructure. The FOF and CNF parts, v3.5.0. J. Autom. Reason. 43(4), 337–362 (2009)
Google Scholar
Sutcliffe, G.: The TPTP world – infrastructure for automated reasoning. In: Clarke, E.M., Voronkov, A. (eds.) LPAR 2010. LNCS (LNAI), vol. 6355, pp. 1–12. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17511-4_1
Google Scholar
Sutcliffe, G.: The TPTP process instruction language, with applications. In: Benzmüller, C., , Woltzenlogel Paleo, B. (eds.) Proceedings of the 11th Workshop on User Interfaces for Theorem Provers, pp. 1. No. 167 in Electronic Proceedings in Theoretical Computer Science (2014)
Google Scholar
Sutcliffe, G.: The CADE ATP system competition - CASC. AI Mag. 37(2), 99–101 (2016)
Google Scholar
Sutcliffe, G.: The TPTP problem library and associated infrastructure. From CNF to TH0, TPTP v6.4.0. J. Autom. Reason. 59(4), 483–502 (2017)
Google Scholar
Sutcliffe, G.: The logic languages of the TPTP world. Logic J. IGPL 31(6), 1153–1169 (2023)
Google Scholar
Sutcliffe, G., Benzmüller, C.: Automated reasoning in higher-order logic using the TPTP THF infrastructure. J. Formaliz. Reason. 3(1), 1–27 (2010)
Google Scholar
Sutcliffe, G., Kotelnikov, E.: TFX: the TPTP extended typed first-order form. In: Konev, B., Urban, J., Schulz, S. (eds.) Proceedings of the 6th Workshop on Practical Aspects of Automated Reasoning, pp. 72–87. No. 2162 in CEUR Workshop Proceedings (2018)
Google Scholar
Sutcliffe, G., Kotthoff, L., Perrault, C., Khalid, Z.: An empirical assessment of progress in automated theorem proving. In: Benzmüller, C., Heule, M., Schmidt, R. (eds.) Proceedings of the 12th International Joint Conference on Automated Reasoning. Lecture Notes in Artificial Intelligence, p. To appear (2024)
Google Scholar
Sutcliffe, G., Schulz, S., Claessen, K., Baumgartner, P.: The TPTP typed first-order form with arithmetic. In: Bjørner, N., Voronkov, A. (eds.) LPAR 2012. LNCS, vol. 7180, pp. 406–419. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28717-6_32
Google Scholar
Sutcliffe, G., Schulz, S., Claessen, K., Van Gelder, A.: Using the TPTP language for writing derivations and finite interpretations. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 67–81. Springer, Heidelberg (2006). https://doi.org/10.1007/11814771_7
Google Scholar
Sutcliffe, G., Seyfang, D.: Smart selective competition parallelism ATP. In: Kumar, A., Russell, I. (eds.) Proceedings of the 12th International FLAIRS Conference, pp. 341–345. AAAI Press (1999)
Google Scholar
Sutcliffe, G., Steen, A., Fontaine, P.: The new TPTP format for interpretations. In: Korovin, K., Rawson, M., Schulz, S. (eds.) Proceedings of the 15th International Workshop on the Implementation of Logics, p. Submitted confidently (2024)
Google Scholar
Sutcliffe, G., Suttner, C.: The TPTP problem library: CNF release v1.2.1. J. Autom. Reason. 21(2), 177–203 (1998)
Google Scholar
Sutcliffe, G., Suttner, C.: Evaluating general purpose automated theorem proving systems. Artif. Intell. 131(1–2), 39–54 (2001)
Google Scholar
Sutcliffe, G., Zimmer, J., Schulz, S.: Communication formalisms for automated theorem proving tools. In: Sorge, V., Colton, S., Fisher, M., Gow, J. (eds.) Proceedings of the Workshop on Agents and Automated Reasoning, pp. 52–57 (2003)
Google Scholar
Trac, S., Puzis, Y., Sutcliffe, G.: An interactive derivation viewer. In: Autexier, S., Benzmüller, C. (eds.) Proceedings of the 7th Workshop on User Interfaces for Theorem Provers. Electronic Notes in Theoretical Computer Science, vol. 174, pp. 109–123 (2007)
Google Scholar
Van Gelder, A., Sutcliffe, G.: Extending the TPTP language to higher-order logic with automated parser generation. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 156–161. Springer, Heidelberg (2006). https://doi.org/10.1007/11814771_15
Google Scholar
Weidenbach, C., et al.: System description: Spass version 1.0.0. In: CADE 1999. LNCS (LNAI), vol. 1632, pp. 378–382. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48660-7_34
Google Scholar
Wilson, G., Minker, J.: Resolution, refinements, and search strategies: a comparative study. IEEE Trans. Comput. C-25(8), 782–801 (1976)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Miami, Miami, USA
Geoff Sutcliffe

Authors

Geoff Sutcliffe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoff Sutcliffe .

Editor information

Editors and Affiliations

Otto-Friedrich-Universität Bamberg, Bamberg, Germany
Christoph Benzmüller
Carnegie Mellon University, Pittsburgh, PA, USA
Marijn J.H. Heule
The University of Manchester, Manchester, UK
Renate A. Schmidt

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutcliffe, G. (2024). Stepping Stones in the TPTP World. In: Benzmüller, C., Heule, M.J., Schmidt, R.A. (eds) Automated Reasoning. IJCAR 2024. Lecture Notes in Computer Science(), vol 14739. Springer, Cham. https://doi.org/10.1007/978-3-031-63498-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-63498-7_3
Published: 01 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63497-0
Online ISBN: 978-3-031-63498-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stepping Stones in the TPTP World