Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?

Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation extraction (RE) from biomedical texts very challenging. In this paper, we hypothesize that enriching a pre-trained transformer model with syntactic information may help improve its performance on chemical–drug RE tasks. For this purpose, we propose three syntax-enhanced models based on the domain-specific BioBERT model: Chunking-Enhanced-BioBERT and Constituency-Tree-BioBERT in which constituency information is integrated and a Multi-Task-Learning framework Multi-Task-Syntactic (MTS)-BioBERT in which syntactic information is injected implicitly by adding syntax-related tasks as training objectives. Besides, we test an existing model Late-Fusion which is enhanced by syntactic dependency information and build ensemble systems combining syntax-enhanced models and non-syntax-enhanced models. Experiments are conducted on the BioCreative VII DrugProt corpus, a manually annotated corpus for the development and evaluation of RE systems. Our results reveal that syntax-enhanced models in general degrade the performance of BioBERT in the scenario of biomedical RE but improve the performance when the subject–object distance of candidate semantic relation is long. We also explore the impact of quality of dependency parses.

Domaines

Traitement du texte et du document Informatique et langage [cs.CL]

Fichier principal

Tang_DATABASE2022.pdf (3.67 Mo)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Pierre Zweigenbaum : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03799268

Soumis le : mardi 22 novembre 2022-16:39:45

Dernière modification le : mercredi 22 janvier 2025-03:41:41

Archivage à long terme le : jeudi 23 février 2023-19:16:45

Dates et versions

hal-03799268 , version 1 (22-11-2022)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-03799268 , version 1
DOI : 10.1093/database/baac070
PUBMED : 36006843
PUBMEDCENTRAL : PMC9408061
WOS : 000844279700001

Citer

Anfu Tang, Louise Deleger, Robert Bossy, Pierre Zweigenbaum, Claire Nédellec. Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?. Database - The journal of Biological Databases and Curation, 2022, 2022, pp.baac070. ⟨10.1093/database/baac070⟩. ⟨hal-03799268⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CENTRALESUPELEC UNIV-PARIS-SACLAY INRAE ANR LISN GS-MATHEMATIQUES GS-COMPUTER-SCIENCE GS-BIOSPHERA GS-LIFE-SCIENCES-HEALTH LISN-ILES MAIAGE MICA-UNITES MATHNUM RESEAU-EAU

313 Consultations

78 Téléchargements