API Documentation - Request

The request can be sent as GET or POST request to the URL
https://api.txtwerk.de/rest/txt/analyzer
. The document and required services are passed as parameters.

Document

The document to be annotated can be passed directly as text.

Alternatively, you can specify the URL of a website to be analyzed. In this case, the site is crawled and the main text content is determined and processed. Foreign elements, such as navigation or teaser texts, will be removed.

Input texts can be weighted if the document is passed as JSON-string. In this case resulting relevances are each multiplied with the corresponding input weight. The response will be structured according to the provided input texts.

Services

The document can be analyzed with different techniques. Please choose from the following services:

entities	Named Entities based on the Wikidata ontology.
tags	Keywords which appear in the text and describe and summarize its content.
categories	Assignment of text to categories (By default: "Politics", "Business", "Cars & Technology", "Internet", "Culture", "Sports", "Travel", "Human interest", and "Science").
dates	Dates and periods.
measures	Measurements that occur in the text.
authors	Authors of an article available as an HTML document.
fingerprints	Fingerprints of the text for the identification of similar documents (near duplicate detection).
lexiconEntities	Named Entites based on an optional user lexicon, which can be managed in TXTWerk additionally.
lexiconTags	Keywords based on a lexicon maintained in TXTWerk.
nerEntities	Named Entities which are neither based on the Wikidata ontology nor on the user lexicon, but derive from a distilled Flair language model.

Service Control

For some services there are more parameters are available, which affect the analysis or the result.

Overview of Parameters

Parameter	Area	Description
text	Document	Contains the document to be annotated as text. If you have longer texts, please send the request as POST request and pass the text in the request body. Mandatory: either text or htmlFile or document Values: Text
htmlFile	Document	Contains the document to be annotated as HTML text. Mandatory: either text or htmlFile or document Values: HTML text file
document	Document	Contains the document to be annotated as JSON string. Mandatory: either text or htmlFile or document Values: JSON string
title	Document	Title of the document. By additionally specifying a title, the result can be improved. However, this will only have an influence on the following services: tags. Mandatory: no Values: Text
teaser	Document	Teaser of the document. By additionally specifying a teaser, the result can be improved. However, this will only have an influence on the following services: tags. Mandatory: no Values: Text
services	Services	List of requested services. Mandatory: yes Values: Comma separated list that contains at least one of the services supported: [entities, tags, categories, dates, measures, authors, fingerprints, lexiconEntities, lexiconTags, nerEntities]
language	Service control	Language of the input document. Language-dependent components can be activated specifically by setting this parameter. Mandatory: no, will then be auto-detected Values: 'en' or 'de'
ntags	Service control	Maximum number of keywords (tags) which are requested. Service: tags. Mandatory: no, Default: 10 Values: non-negative integer
ncategories	Service control	Number of categories to be returned. Service: categories. Mandatory: no Values: non-negative integer
nentities	Service control	Number of entities to be returned. Service: entities. Mandatory: no Values: non-negative integer
nerMinConfidence	Service control	Threshold for the entity confidence. Service: entities. Mandatory: no, Default: 30 Values: non-negative integer
nerMinRelevance	Service control	Schwellwert für den Relevanzwert bei den Entitäten. Service: entities. Mandatory: no Values: non-negative integer
nerFormat	Service control	Response format for the entities. Service: entities. Mandatory: no Values: 'list', 'aggregate' (aggregated list of entities, sorted by relevance), 'candidates' (for each entity a list of all disambiguation candidates)
nerMetadata	Service control	Additional metadata for the entities. Service: entities. Mandatory: no Values: true or false
nerMetadataProperties	Service control	Properties to be returned for the entities. Service: entities. Mandatory: no Values: Comma separated list. Depends on what properties were created by the user beforehand, e.g. 'description' (if entities contain a description) or 'typetree' (if information on hierarchical relations between entities are stored). null passes all existing properties.
nerAnnotations	Service control	Additional information on entities, e.g. pages on Wikipedia and aliases. Service: entities. Mandatory: no Values: Comma separated list containing one or several of the annotation layers (Currently available: 'aliases' and 'wikipedia').

Example Request

Example of a POST request where the document is passed directly as text::

curl "https://api.txtwerk.de/rest/txt/analyzer" \
    -H "X-Api-Key: ..." \
    -d text='TXTWerk ist die Textmining-API der Neofonie GmbH, ein in Berlin ansässiger Fullservice-Provider. Neben Entitäten und Schlagwörtern erkennt TXTWerk in Texten unter anderem auch Datumsangaben (z.B. 08.09.2023) und Maßzahlen (z.B. 24h) und ordnet jeden Text einer passenden Textklasse zu.' \
    -d services='entities'

Example of a POST request where a HTML file is passed directly as input parameter::

curl "https://api.txtwerk.de/rest/txt/analyzer" \
    -H "X-Api-Key: ..." \
    -F htmlFile='@' \
    -F services='entities'

Example of a POST request where the document is provided as JSON with weights::

curl "https://api.txtwerk.de/rest/txt/analyzer" \
    -H "X-Api-Key: ..." \
    -d services='entities'
    -d document='[{ "text": "Titel", "weight": "2.0" } , { "text": "Teaser", "weight": "1.5" }]'