Abstract
The establishment of a robust standard components database is essential in various industries to streamline product development and ensure quality. This paper presents a system for querying standard components data, leveraging the power of logical filtering and semantic retrieval. The structured approach of this system includes a well-defined database structure, logical filtering capabilities at different data levels, and advanced semantic retrieval techniques. The outputs of the system demonstrate its effectiveness in handling user queries, analysing unstructured data, and providing meaningful feedback based on logical filtering outcomes. This research contributes to the efficient utilization of standard components data through an innovative and powerful digital query system.
You have full access to this open access chapter, Download conference paper PDF
Keywords
1 Introduction
In modern industries, effectively managing and utilizing standard components is crucial for achieving high-quality products, ensuring cost-effectiveness, and meeting project deadlines. The fundamental point is to establish a comprehensive standard components database. While collecting comprehensive components data is too difficult to be achieved, this paper focuses on developing a query system for the database, which is expandable for effortless data integration. Logical filtering and semantic searching are integrated to enhance system’s functionality. Data of SKF [1] hydraulic seals is used as an illustrative example in the system developing process.
A robust standard components database acts as a centralized repository, promoting consistency across projects, reducing duplication, and facilitating team collaboration. Similar to MEGARes 2.0 [2], which aids identifying antimicrobial resistance genes in metagenomic data for epidemiological investigations. Such a database speeds up the design process by eliminating the need for manual handbook searches, making it indispensable for the application of AI in industrial settings.
Creating a database for standard components, akin to PubChem’s [3] inter-linked Substance, Compound, and BioAssay databases, needs a well-organized data structure, robust search functionalities (including logical and semantic filtering), and accessible through APIs for programmatic use.
The query system for the database should contain the following parts as shown in Fig. 1. The user interface is the entry point for users to interact with the query system. It comprises two main components:
-
User Input: Takes two forms, structured queries and content-based queries. Structured queries allow users to specify attributes such as dimensions, materials, and performance metrics in a structured format. Content-based queries leverage natural language input, enabling users to describe their needs in more intuitive terms.
-
Result Presentation: Showcases the retrieved standard components. Users can explore the results, compare components, and select the most suitable ones for their projects.
The query processor is responsible for handling user inputs and transforming them into database processable requests. It comprises two key modules:
-
Logical Filtering Module: Allows users to filter components based on attributes such as size, material, or other technical specifications. Supports cross-logical filtering, combining criteria from different data tables to identify components that meet complex requirements.
-
Semantic Retrieval Module: Leveraging advanced natural language processing as showed in the vitrivr [4] and SOSRepair [5]. Instead of relying on precise keyword matches, it interprets query descriptions based on content, delivering results that harmonize better with the user’s intended context. It’s used for exploration of standard components descriptions and related textual materials.
The standard components database is the core of the architecture, housing a collection of standardized parts, specifications, and related data. It is structured to accommodate different data types:
-
Structured Data: Includes basic data tables, which store the structured information about standard components, such as technical specifications, part numbers, and dimensions.
-
Unstructured Data: Contains additional information about the components in various formats, such as documentation, images, CAD drawings, and other multimedia elements.
2 Standard Components Database
2.1 Structured Data
The standard components database contains three types of structured data tables: basic data tables, associated data tables, and multilevel data tables. Figure 2 provides a visual representation of how these three types of data tables are interconnected, covering all the structured data related to the components in this comprehensive framework.
Basic Data Tables: The fundamental building blocks of the database, containing essential structured information about individual standard components.
Associated Data Tables: Providing additional context information for components in different basic data tables, enhancing the abilities of components cross-reference and components filtering based on associated data.
Multilevel Data Tables: Capturing the hierarchical relationships within standard components data, which enables representing assemblies, sub-components, and other intricate structures, and retrieving information at various levels of detail.
2.1.1 Basic Data Tables
A basic data table is designed as a simple bivariate chart, in which columns stand for distinct features or attributes, while rows correspond to individual standard components. For instance, Table 1 displays a basic data table illustrating the types and basic features of seal parts.
2.1.2 Associated Data Tables
An associated data table is a set of basic data tables, with each of them focusing on specific sets of attributes that are related in components filtering.
For example, Table 1 records seal types and materials, while Table 2 records hydraulic fluids and seal material compatibility. The synergy between these tables is crucial in choosing an appropriate seal for a specific application, as the compatibility of hydraulic fluids with seal materials (from Table 2) and the corresponding seal types (from Table 1) collectively determine the optimal choice. When a particular hydraulic fluid is specified, the system utilizes associated data from both Tables 1 and 2 to deliver logical filtering outcomes. This ensures that the chosen seal not only aligns with the hydraulic fluid but also correlates with other attributes detailed in the associated data tables.
2.1.3 Multilevel Data Tables
A multilevel data table is a basic data table and its sub-tables, capturing how different attributes of the component interact with one another.
For instance, consider a scenario where Table 3 represents basic seal installation dimensions, while Table 4 serves as a sub-table, capturing installation dimensions that are specifically related to pressure considerations. While Table 3 provides fundamental installation dimensions, it is Table 4 that refines these dimensions in the context of pressure requirements. Tables 3 and 4 collaboratively define the complete installation dimensions of a seal under varying pressure conditions.
2.2 Unstructured Data
The standard components database contains two main types of unstructured data: component descriptions and multimedia assets. Component descriptions consist of Textual narratives that offer detailed information about the characteristics, features, and possible uses of standard components. Multimedia assets encompass visual and multimedia resources like pictures, videos, and CAD drawings, which help users visualize physical attributes of standard components.
2.2.1 Component Descriptions
Employs JSON to store textual descriptions that can not fit into SQL data table. Python can be used to manipulate JSON files, supporting programmatic interactions and implementing advanced semantic search. Here is an example describing the temperature condition of the hydraulic seals.
2.2.2 Multimedia Assets (Images, CAD Drawings, Etc.)
Document-oriented NoSQL databases are well-suited for handling multimedia assets. They excel in managing and storing unstructured data in flexible, JSON-like documents. Storing multimedia files alongside associated metadata and unique identifiers ensures easy retrieval, categorization, and access control.
3 Query Processor
3.1 Logical Filtering Module
In the query system design, all structured information finds its place in basic data tables. These tables comprise ‘Attributes’ as column names, ‘Entries’ as unique row values, and ‘Values’ as the contents within. In essence, structured data can be represented as triples denoted as (A, E, V), where A stands for Attributes, E for Entries, and V for Values. The logic filtering is the process of completing triple from these tables based on user-query conditions, such as (A, E, ?), (A, ?, V) and (?, E, V).
For logical filtering of basic data tables, the data of a (A, E, V) triple can be get from a single table using SQL query commands.
For an associated data table, logical filtering requires a series of (A, E, V) triples to be completed, crossing different basic data tables in the set. Take selecting types of seals that are compatible with hydraulic fluids material according to Tables 1 and 2, as described in Sect. 2.1.2, for example. As shown in Fig. 3, one basic table corresponds to one triple, where the user-query condition is (A2, ?, V2), the user-query target is (A1, E1, V1), and the associated table’s matching condition is E2 \(\iff \) (A1, V1). So, the mathematical description of the logical filtering is shown in Eq. (1).
For a multilevel data table, logical filtering also involves completing a series of (A, E, V) triples, while the matching condition between basic tables is different. As exemplified in the description of parent Tables 3 and 4 in Sect. 2.1.3, Fig. 4 shows that the matching condition is (A3′, V3′) \(\Rightarrow \) E4 and V4 \(\Rightarrow \) (A3, V3). The goal is to obtain the correct value for Table 3 from Table 4. Therefore, the user-query condition is (A3, E3, ?) and (A4, ?, ?), the user-query target is (A3, E3, V3), and the mathematical description of the logical filtering is shown in Eq. (2).
In the query system, all the basic data tables should be connected using the described relationships, forming a complete ontology. This means that starting from any point within the ontology, the user should be able to obtain a clear query result with sufficient conditions. Figure 5 illustrates an example ontology from Tables 1 to 4.
3.2 Semantic Retrieval Module
In the query system design, the semantic retrieval module is used for user content-based query analyse and Textual description analyse.
Content-based query analysis aims to convert unstructured textual queries into a structured format for the query system’s comprehension. This involves initial text parsing to identify relevant elements like keywords, phrases, and entities, using techniques like tokenization, part-of-speech tagging, and named entity recognition. The next steps include structuring the query by extracting subject, predicate, and object information, which often results in the creation of (A, E, V) triples. Entity resolution is then performed to link entities to specific database tables or records, ensuring the system knows where to retrieve data. Finally, the structured query conditions and targets are used to generate a formal query, typically in SQL or a similar query language, for execution against the database. Logical filtering can be applied to further refine the results based on user-defined criteria or constraints.
Textual description analysis is used to analyse component descriptions described in Sect. 2.2.1. It can extract valuable information from text, and generate meaningful natural language responses. It involves identifying key information, such as facts, entities, and relationships through techniques like sentiment analysis and topic modelling. It enables the query system to handle unstructured data and generate contextually appropriate responses by assembling information and providing human-like answers.
4 User Interface
An example of the user interface for the query system is shown in Fig. 6, featuring logical filtering and semantic search as inputs, along with a preview of structured data and unstructured data (models and figures) from the standard components database.
References
Hydraulic seals|SKF. https://www.skf.com/us/products/industrial-seals/hydraulic-seals. Accessed 13 Aug 2023
Doster E et al (2020) MEGARes 2.0: a database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data. Nucleic Acids Res 48(D1):D561–D569
Kim S et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202-1213
Rossetto L, Gasser R, Heller S, Amiri Parian M, Schuldt H (2019) Retrieval of structured and unstructured data with vitrivr. In: Proceedings of the ACM workshop on lifelog search challenge, pp 27–31, June 2019
Afzal A, Motwani M, Stolee KT, Brun Y, Le Goues C (2021) SOSRepair: expressive semantic search for real-world program repair. IEEE Trans Software Eng 47(10):2162–2181
Acknowledgements
This research was supported by the National Natural Science Foundation of China (52205279), the Open Foundation of the National Engineering Technology Research Center for Prefabrication Construction in Civil Engineering (2021CPCCE-K02), and the Top Discipline Plan of Shanghai Universities-Class I.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this paper
Cite this paper
Huang, Z., Bian, Y., Yang, M. (2024). Standard Components Query System Based on Logical Filtering and Semantic Retrieval. In: Halgamuge, S.K., Zhang, H., Zhao, D., Bian, Y. (eds) The 8th International Conference on Advances in Construction Machinery and Vehicle Engineering. ICACMVE 2023. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-97-1876-4_91
Download citation
DOI: https://doi.org/10.1007/978-981-97-1876-4_91
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1875-7
Online ISBN: 978-981-97-1876-4
eBook Packages: EngineeringEngineering (R0)