Knowledge extraction from unstructured data

Sakor, Ahmad

Download statistics - Document (COUNTER):

Sakor, Ahmad: Knowledge extraction from unstructured data. Hannover : Gottfried Wilhelm Leibniz Universität, Diss., 2023, XVIII, 184 S., DOI: https://doi.org/10.15488/13721

Selected time period:

Sum total of downloads: 804

distribution of downloads over the selected time period
downloads by country

back to single item view (close usage statistics)

FileSakor_Ahmad_2023.pdf

Size7.4 MB

FormatAdobe PDF

View

Abstract:
Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models.
License of this version:	CC BY 3.0 DE
Document Type:	DoctoralThesis
Publishing status:	publishedVersion
Issue Date:	2023
Appears in Collections:	Fakultät für Elektrotechnik und Informatik Dissertationen

distribution of downloads over the selected time period:

downloads by country:

pos.	country		downloads
pos.	country		total	perc.
1		Germany	245	30.47%
2		United States	147	18.28%
3		India	44	5.47%
4		No geo information available	39	4.85%
5		France	26	3.23%
6		Russian Federation	25	3.11%
7		China	19	2.36%
8		United Kingdom	17	2.11%
9		Korea, Republic of	12	1.49%
10		Greece	11	1.37%
		other countries	219	27.24%

Further download figures and rankings:

Hinweis

Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Search the repository

Browse

All content
- Communities & Collections
- By Issue Date
- Authors
- Titles
- Subjects
- Subjects (GND)
- DDC
- License
- Type
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Subjects (GND)
- DDC
- License
- Type