Feature-based detection of automated language models: Tackling GPT-2, GPT-3 and Grover

Fröhling, Leon; Zubiaga, Arkaitz

Feature-based detection of automated language models: Tackling GPT-2, GPT-3 and Grover

Services

Deutsch English

About the Repository Search and Browse Publish

Download statistics - Document (COUNTER):

Fröhling, L.; Zubiaga, A.: Feature-based detection of automated language models: Tackling GPT-2, GPT-3 and Grover. In: PeerJ Computer Science 7 (2021), e443. DOI: https://doi.org/10.7717/peerj-cs.443

Repository version

To cite the version in the repository, please use this identifier: https://doi.org/10.15488/15750

Selected time period:

Sum total of downloads: 21

distribution of downloads over the selected time period
downloads by country

back to single item view (close usage statistics)

FileFeature-based_det ...

Size1.33 MB

FormatAdobe PDF

View

Abstract:
The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale generation of text with minimal cost and effort undermines the trust in human interaction and factual information online. While most of the current approaches rely on the availability of expensive language models, we propose a simple feature-based classifier for the detection problem, using carefully crafted features that attempt to model intrinsic differences between human and machine text. Our research contributes to the field in producing a detection method that achieves performance competitive with far more expensive methods, offering an accessible “first line-of-defense” against the abuse of language models. Furthermore, our experiments show that different sampling methods lead to different types of ﬂaws in generated text.
License of this version:	CC BY 4.0 Unported
Document Type:	Article
Publishing status:	publishedVersion
Issue Date:	2021
Appears in Collections:	Wirtschaftswissenschaftliche Fakultät

distribution of downloads over the selected time period:

downloads by country:

pos.	country		downloads
pos.	country		total	perc.
1		Germany	10	47.62%
2		United States	5	23.81%
3		Japan	1	4.76%
4		Indonesia	1	4.76%
5		Europe	1	4.76%
6		Estonia	1	4.76%
7		Switzerland	1	4.76%
8		Austria	1	4.76%

Further download figures and rankings:

Hinweis

Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Search the repository

Browse

All content
- Communities & Collections
- By Issue Date
- Authors
- Titles
- Subjects
- Subjects (GND)
- DDC
- License
- Type
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Subjects (GND)
- DDC
- License
- Type

Feature-based detection of automated language models: Tackling GPT-2, GPT-3 and Grover

Download statistics - Document (COUNTER):

Repository version

Selected time period:

Sum total of downloads: 21

distribution of downloads over the selected time period:

downloads by country:

Further download figures and rankings:

Search the repository

Browse

All content

This Collection