LINGUISTIC ONTOLOGIES: DESIGNING AND USING IN THE EDUCATIONAL INTELLECTUAL SYSTEMS

The purpose of the article is to investigate and consider the general trends, problems and prospects of designing and using linguistic ontologies in educational intellectual systems. The research methodology consists in semantic analysis methods of the basic concepts in the considered subject area (linguistic ontologies in the educational intellectual systems). The article discusses approaches to the use of linguistic models in modern educational intelligent systems. ЕЛЕКТРОННІ РЕСУРСИ ТА ІНФОРМАЦІЙНО-КОМУНІКАЦІЙНІ ТЕХНОЛОГІЇ


Introduction.
Modern educational intellectual systems work with textual information and knowledge of domains, which include thousands of different classes of entities that are among themselves in a huge number of different types of relationships (Liu, 2017;Greger and Porshnev, 2013).
Processing information and knowledge in such systems are often guided by the use of statistical characteristics of this information and knowledge: -frequency of occurrence of words in educational `s materials, tests, reference information, glossaries, etc.; -frequency of joint occurrence of words.
-Users of educational intellectual systems (lecturers, methodologists, students), performing text educational information and knowledge processing, primarily: -reveals the main content of the educational documents and the meaning of its key concepts; -the main topic, subtopics and key concepts of the educational documents (training materials, tests, reference information, glossaries, etc.).
For this, the user of educational intellectual systems usually uses a large amount of knowledge about: linguistic knowledge -a language of presentation of training materials, tests, reference information; ontological knowledge -domain; -relations between units of linguistic knowledge -organization of the coherent text.
Lack of linguistic and ontological knowledge leads to a variety of problems when, for example: -formulating queries differ from templates of describing relevant educational information and knowledge that are supported by educational intellectual systems; -long requests are processed (for example, when referring to help information); -the context of the language is not fully taken into account. Thus, modern educational intellectual systems for processing text information face the following problems (Scherer, 2016): -processing of text information of online courses in the considered domain; -taking into account the linguistic features of the language and the structure of the corresponding educational or test`s text.
These problems are in the educational intellectual systems. Intellectual text analysis is one of the key tasks in the field of artificial intelligence associated with the problems of automatic analysis and synthesis of natural language arising from the interaction of users with educational intellectual systems.
The solution to these problems is closely related to the use of various approaches of artificial intelligence and computational linguistics.
Ontological modelling and computer learning methods have made it possible for practical use in natural language processing tasks in the educational intellectual systems.
The use of linguistic and ontological knowledge in the automatic processing of texts in educational intellectual systems is a difficult problem. This is due to the fact that such knowledge should be described in specially created thesauri (ISO 25964-1:2011(ISO 25964-1: , 2011 and linguistic ontologies, which should contain descriptions of a lot of words and phrases and be able to logically derive new knowledge.
The paper considers the extraction of information from the text, which can be used to create formal models of specific areas of linguistic knowledge. In work, this is the area of online courses in the disciplines "Computer networks" and "Modelling systems".
Model based on distribution semantics determines the semantic similarity between two linguistic elements (words or phrases) based on their distribution properties in large fragments of educational material or tests without specific knowledge of the lexical or grammatical meanings of the elements.
A word set is a collection of documents in the form of a matrix, the rows of which correspond to the documents, and the columns to a specific term. Intersection values describe the number of words in a particular educational document.
These models often include a weight for each term -document pair. The indicator is the frequency of occurrences of a term in each educational document or the probability of finding a word in an educational document.
This rates the more general words as more important, although this is not always the case. One of the paradigms of computer resources for educational intellectual systems are formal ontologies (for example, the Semantic Web) (O. Тkachenko, А. Тkachenko and К. Тkachenko, 2020;Munira and Anjumb, 2018;Sowa, 2009;Web Ontology Language). But the automatic processing of unstructured natural language texts is difficult to carry out using formal ontologies (Nirenburg and Wilks, 2011;List, 2018;Kudashev, 2013).
Linguistic ontologies cover most of the words of a language or domain and at the same time have an ontological structure that manifests itself in the relationship between concepts.
Therefore, linguistic ontologies can be considered as a special type of lexical database (or knowledgebase) and a special type of ontology.
The paper describes a linguistic ontology designed for automatic text processing for the considered domain and the resources that are developed on the basis of this ontology.
Main research material. The following can serve as a formal definition of linguistic ontologies:

О=<C, E, At, R, A>,
where: -С − concepts (classes) of linguistic ontology; -E − instances of linguistic ontology; -At − attributes of concepts and instances of linguistic ontology; -R − relations between concepts of linguistic ontology; -А − axioms of linguistic ontology. Formalized linguistic ontologies consider various computer resources, in particular, rubricators or thesauri.
Typically, rubricators do not include instances and attributes, i.e. the formal model of rubricators is a model of the form:
Linguistic ontologies (thesauri, rubricators), the concepts of which are not fully defined in terms of formal properties and axioms, are called lightweight ontologies.
There are different interpretations of the relationship between linguistic ontology and the natural language of educational documents in the educational intellectual system: -linguistic ontology is a structure independent of natural language; -linguistic ontology is a structure that is independent of a specific natural language; -elements of the language lexicon are included in the formal definition of linguistic ontology; -the formal definition of linguistic ontology includes the entire lexicon of the domain.
Based on the foregoing, the formal model of linguistic ontology can be described as: where: -R VR − the set of links between lexical units {v j } V and the corresponding relations {r i } R of the given linguistic ontology; -R is the set of relationships between concepts of linguistic ontology; -A is a set of linguistic ontology axioms. In the considered formal approaches, words of a natural language are one of the components of the linguistic ontological model, lexical expressions are presented only as auxiliary elements that name the concepts and relations of the linguistic ontology.
Establishing relationships between linguistic concepts, words and expressions of a natural language has many problems, in particular, the introduction of a new concept into linguistic ontology must be associated with existing linguistic elements; definition of relations "concept − linguistic element".
Therefore, a lot of widely known educational ontological resources are thesauri that do not have a high degree of formalization of their structure.
Thesauri are linguistic ontologies, i.e. ontologies based on the meanings of real natural language expressions. The educational intellectual system thesaurus is a normative vocabulary of terms in natural language that explicitly indicates the relationship between terms and is intended to describe the content of documents and search queries.
The basic unit of thesauri is terms, which are categorized into descriptors (= authorized terms) and non-descriptors (= ascriptors).
At their core, descriptors unambiguously correspond to the concepts of the domain. Relationships between descriptors are divided into hierarchical and associative.
Hierarchical relationships are usually viewed as asymmetric and transitive. Hierarchical relationships used in educational intellectual systems thesauri: -class − subclass (predecessor − successor, above − below) − is installed between two descriptors, if the concept of a lower-level descriptor (successor, subclass) is included in the concept of a superior descriptor (predecessor, class); -whole − part. The purpose of developing educational intellectual systems thesauri is to use their linguistic units (descriptors) to describe the main topics of educational documents in the process of manual indexing.
Therefore, it is important that the set of thesaurus descriptors allow describing the topics of educational, methodological, tests and reference documents of the domain.
In this case, the indexing process for such a thesaurus is based on linguistic, grammatical knowledge, as well as knowledge of the domain.
To determine the semantics of the educational documents, tests, the component of the educational intellectual system -the program "Indexer" -must first read the text, understand it and then state the content of the text using the descriptors specified in the thesaurus.
The program "Indexer" should have a good understanding of all the terminology used in the text − to describe the main topic of the educational text, he will need a much smaller number of terms.
The presence of the program "Indexer" testifies to the intellectualization of the educational intellectual system.
Thus, the formal model of the thesaurus (T) of the educational intellectual system can be represented as follows: where: -D is a set of domain descriptors corresponding to the linguistic concepts of a given domain, which are necessary to express the main topics of documents in this domain; -С − a set of terms (linguistic concepts) of the domain: D С; -Rrelations of the thesaurus, R = R I U R A (R I -hierarchical and R A -associative relations of the thesaurus); -Aaxioms of transitivity of hierarchical relations. The described model of the thesaurus of the educational intellectual system is intended for its use documents in the process of expert analysis of educational, methodological, test and reference documents.
A thesaurus intended for automatic text processing should contain much more information about the structure and language of the domain.
The relationships between the terms specified in the thesaurus should be formalized for their use in the educational intellectual system.
Formal linguistic ontologies (with their independence from a particular language) are difficult to use in automatic text processing for information (knowledge) retrieval applications because: -units of formal linguistic ontology must be associated with units of a specific natural language; -the desire for a clear formalization of relations between linguistic concepts in a formal linguistic ontology is difficult to observe when creating super-large resources; -leads to problems in establishing relations "concept -the linguistic expression". An educational intellectual system deals not only with general vocabulary but also with specific domains and their terminologies.
The description of the terminology of the domains of educational intellectual systems should use: -information (knowledge) retrieval context; -resource units, which are created based on the values of terms; -description of verbose expressions; principles of inclusion (non-inclusion) of verbose linguistic units; -a small set of relationships between conceptual linguistic units. The use of a linguistic resource in automatic text processing in an educational intellectual system should take into account the following provisions: -conceptual units are created based on the meanings of real linguistic expressions; -multi-step hierarchical construction of the lexical and terminological system of concepts; -principles of describing the meanings of polysemous words and expressions; -development of linguistic ontology as a hierarchical system; the use of formally defined relations with formal properties; -the use of transitivity and inheritance of relations between concepts of a domain as axioms (inference rules). The where W m are text inputs that refer to more than one concept of the linguistic ontology, and W a are multivalued text inputs that are represented in the ontology by only one value; -L − a set of lemmatic representations of a linguistic expression (for example, the phrase information system is presented in a lemmatic form as a computer network); -T W is a mapping of the terminological composition of a given domain to text inputs and linguistic ontology concepts.
The proposed linguistic ontology of the domain is a knowledge base of the ontological type about the conceptual system, the lexical and terminological composition of the domain (disciplines "Computer networks" and "Modelling systems"), supported by the corresponding educational intellectual system.
The unit of linguistic ontology is a concept, as a linguistic unit in a system of concepts, which has its own specific properties that distinguish this unit from other linguistic units in the system of linguistic concepts.
Each entered concept must have a unique name. The name can be an unambiguous word or phrase, the meaning of which corresponds to this linguistic concept.
Each concept is supplied with a set of text inputs − language expressions, the values of which correspond to the given linguistic concept. Such linguistic expressions are ontological synonyms among themselves.
The texts may contain many variants of text inputs of a particular linguistic concept.
The developer of an educational intellectual system or a specific online course must record these options immediately when entering a linguistic concept, or supplement it when found in a specific text.
In the texts of the domain, a significant part is made up of words that belong not only to a specific domain but also to the general vocabulary of many domains, for example, create, participate, accept, evaluate, etc. Therefore, the polysemantic words described in the linguistic model are divided on: -the set W m , which includes expressions related to two or more linguistic concepts; -the set W a , which includes expressions related to one linguistic concept, but these words may have a different meaning in the general lexicon, which is marked by a special mark of ambiguity.
Relationships between concepts from an ontological resource should perform the following functions: -these relations should be used in the classic functions of linguistic information and knowledge retrieval thesauri to expand a search query or display a heading of an educational document; -relations should be used to resolve the ambiguity of linguistic units included in the resource; -relations in an ontological resource can be used to identify lexical connectivity in texts and to use the revealed text structure to improve the quality of text processing.
When creating a linguistic ontology of large magnitude, for processing texts that are not limited in style, genre, size, the most stable way is to rely on relationships that do not disappear, do not change during the entire lifetime of any or the vast majority of instances of the concept: for example,the software always consists of programs.
Therefore, in linguistic ontology, relations are described only between such concepts ci and cj, which are inherent in at least one of these concepts by definition. The properties of transitivity and inheritance are used as axioms.
For a logical conclusion when processing texts in the domain, it is necessary to describe the relationship between linguistic concepts that retain their significance, reliability in various contexts of mentioned concepts.
The main relations in the proposed linguistic ontology are: -class-subclass; -whole-part; -relation of ontological dependence (asymmetric association); -symmetrical association. Let the class−subclass (c i , c j ) be the relationship between the concepts c i and c j (c i is a subclass of c j ), r (c i , c j ) be an arbitrary relationship between the concepts c i and c j . Class−subclass relationships have transitivity and inheritance properties. However, the same expressions of natural language can correspond to different relationships between entities of the domain, including those with completely different properties (Magnini and Speranza, 2002). Therefore, you should check the established class-subclass relationship. For example, to check the belonging of instances of a lower-level concept c i to a set of instances of a higher-level concept, which implies an answer to the question: If an object is an instance of one concept, then will it necessarily be an instance of some other concept c j ?
The feature of the whole-part relationship is one of the most famous and useful in various domains. Part-whole relationship is the variety of its manifestations. The most typical objects to which this relation applies are physical objects, entities that last in time, groups of entities, processes, etc.
When modelling this relationship in computer resources, it is important to ensure its transitivity. When describing the whole-part relationship in the proposed model of linguistic ontology, efforts were made to ensure the transitivity of this relationship. That is, it is necessary to describe the whole-part relationship as follows: if the text (a fragment of the text) is devoted to the discussion of a part, then it can be assumed that the text (a fragment of the text) will be relevant to the discussion of the whole.
The condition for ensuring such inheritance is the ontological dependence of the existence of a part on the existence of the whole.
The part dependency can be like this: -in existence, when an instance of a part cannot be separated from an instance− whole; -generic, in which the existence of an instance−part requires the existence of at least one instance of the whole.
The description of hierarchical relationships should be independent of the context in which they are mentioned. This is important in automatic text processing since in automatic mode it is often impossible to use the context to confirm the existence of a particular relationship.
In linguistic ontologies, the following properties of the whole-part relationship are used: -part (с The linguistic concept c i is externally dependent on the linguistic concept c j if for all instances of c i there is an instance c j that is not part or material of the instance c i . For example, the linguistic concept of a son is externally dependent on the concept of a "parent", since it exists only within the "family" in relation to its "parents". And the linguistic concept of a "three" is not externally dependent on any entity, since it requires the existence of a "three root", which is part of the "threeе".
The asymmetric association relation Ass represents an external ontological relationship between concepts. This relationship is established between the linguistic concepts с 1 and с 2 if the following conditions are satisfied: -between the linguistic concepts с 1 and с 2 , the class-subclass and / or whole-part relations cannot be established; -the statement is true: the existence of с 2 means the existence of с 1 . These conditions mean that the dependent linguistic concept с 2 is externally dependent on с 1 : Ass 1 (c 2 , c 1 ) = Ass 2 (c 1 , c 2 ).
Ontological dependency relationships are applicable to different areas, so they are most often used in top-level linguistic ontologies.
For various applications of automatic word processing, some groupings of linguistic concepts and relations in the linguistic ontology are used.
Linguistic ontologies based on the described model. The above principles were the basis for the development of an ontology for the disciplines "Computer networks" and "Modelling systems".
The created ontological resources have the same structure. They are linguistic ontologies because they describe the concepts of the domain and the relationship between them.
These resources belong to linguistic ontologies since the introduction of concepts is largely motivated by the meanings of linguistic units related to the domain of the resource.
At the same time, they are thesauri, since each linguistic concept is associated with a set of linguistic expressions (words, terms, phrases) with which this linguistic concept can be expressed in a text -such a set of textual concept inputs is necessary to use linguistic ontologies for automatic text processing.
Each term is provided with a description (dictionary entry), has hierarchical links with other terms and synonyms. Fig. 1 shows a list of hyperlinks to dictionary entries of the "main root" key terms (linguistic concepts) of the domains "Computer networks" and "Modelling systems".
Having opened the dictionary entry of a term, we get a description of the term, a list of other related terms and lists of publications and persons related to this term.
The performed layout allows you to view the thesaurus in alphabetical order of its text inputs.
The choice of specific text input, for example, NETWORK, allows you to see the totality of concepts to which this word is attributed, namely to the concepts of: -COMPUTER NETWORK; -NEURAL NETWORK; -SEMANTIC WEB; -PETRI NET. For each concept, complete lists of text inputs are indicated, including words of different parts of speech, as well as phrases. So, for the concept INFORMATION TECH-NOLOGY, the text inputs are words and expressions: technology, information, software, information resources, information system ( fig. 2).
For each linguistic concept, relationships with other linguistic concepts are indicated. In fig. 2 The information base (database, knowledgebase) supporting the proposed linguistic ontology includes: -set of concepts for the domain under consideration (disciplines "Computer networks" and "Modelling systems", which are supported by the corresponding educational intellectual system): • linguistic concepts of general vocabulary;  -set of relationships between the concepts of the considered domain; -many text inputs of the thesaurus; -description of text inputs: • lemmatical representation of text input; • syntactic type; • the main word of the noun phrase; -set of correspondences of text inputs to the linguistic concepts of the thesaurus of the educational intellectual system.
Conclusion. The article presents a model of linguistic ontology for the domains (disciplines "Computer networks" and "Modelling systems").
This model is used in the development of an educational intellectual system that supports online learning in these disciplines. In the proposed model, a set of relations of a linguistic ontology is described, which is specially selected to describe the domain under consideration.
The functions of relations of the linguistic ontology of information and knowledge retrieval are possible when providing multi-step logical inference based on the properties of transitivity and inheritance of relations and their independence from the context of the linguistic concept. To provide these properties, it was proposed to use a small set of relations.
Ontological definitions of the relations used were introduced. Such a system of relations reflects the most essential relationships between entities and can be used to describe relationships between linguistic concepts in a variety of disciplines, supported by educational intellectual systems.
The proposed linguistic ontological model was implemented in the implementation of an educational intellectual system that supports the disciplines "Computer networks" and "Modelling systems".