Login

en | de

API Documentation - Request

The request can be sent as GET or POST request to the URL
 https://api.txtwerk.de/rest/txt/analyzer
. The document and required services are passed as parameters.

Document

The document to be annotated can be passed directly as text.

Alternatively, you can specify the URL of a website to be analyzed. In this case, the site is crawled and the main text content is determined and processed. Foreign elements, such as navigation or teaser texts, will be removed.

Input texts can be weighted if the document is passed as JSON-string. In this case resulting relevances are each multiplied with the corresponding input weight. The response will be structured according to the provided input texts.

Services

The document can be analyzed with different techniques. Please choose from the following services:

entities Named Entities based on the Wikidata ontology.
tags Keywords which appear in the text and describe and summarize its content.
categories Assignment of text to categories (By default: "Politics", "Business", "Cars & Technology", "Internet", "Culture", "Sports", "Travel", "Human interest", and "Science").
dates Dates and periods.
measures Measurements that occur in the text.
authors Authors of an article available as an HTML document.
fingerprints Fingerprints of the text for the identification of similar documents (near duplicate detection).
lexiconEntities Named Entites based on an optional user lexicon, which can be managed in TXTWerk additionally.
lexiconTags Keywords based on a lexicon maintained in TXTWerk.
nerEntities Named Entities which are neither based on the Wikidata ontology nor on the user lexicon, but derive from a distilled Flair language model.

Service Control

For some services there are more parameters are available, which affect the analysis or the result.

Overview of Parameters

Parameter Area Description
text Document Contains the document to be annotated as text. If you have longer texts, please send the request as POST request and pass the text in the request body.

Mandatory: either text or htmlFile or document
Values: Text
htmlFile Document Contains the document to be annotated as HTML text.

Mandatory: either text or htmlFile or document
Values: HTML text file
document Document Contains the document to be annotated as JSON string.

Mandatory: either text or htmlFile or document
Values: JSON string
title Document Title of the document. By additionally specifying a title, the result can be improved. However, this will only have an influence on the following services: tags.

Mandatory: no
Values: Text
teaser Document Teaser of the document. By additionally specifying a teaser, the result can be improved. However, this will only have an influence on the following services: tags.

Mandatory: no
Values: Text
services Services List of requested services.

Mandatory: yes
Values: Comma separated list that contains at least one of the services supported: [entities, tags, categories, dates, measures, authors, fingerprints, lexiconEntities, lexiconTags, nerEntities]
language Service control Language of the input document. Language-dependent components can be activated specifically by setting this parameter.

Mandatory: no, will then be auto-detected
Values: 'en' or 'de'
ntags Service control Maximum number of keywords (tags) which are requested.
Service: tags.

Mandatory: no, Default: 10
Values: non-negative integer
ncategories Service control Number of categories to be returned.
Service: categories.

Mandatory: no
Values: non-negative integer
nentities Service control Number of entities to be returned.
Service: entities.

Mandatory: no
Values: non-negative integer
nerMinConfidence Service control Threshold for the entity confidence.
Service: entities.

Mandatory: no, Default: 30
Values: non-negative integer
nerMinRelevance Service control Schwellwert für den Relevanzwert bei den Entitäten.
Service: entities.

Mandatory: no
Values: non-negative integer
nerFormat Service control Response format for the entities.
Service: entities.

Mandatory: no
Values: 'list', 'aggregate' (aggregated list of entities, sorted by relevance), 'candidates' (for each entity a list of all disambiguation candidates)
nerMetadata Service control Additional metadata for the entities.
Service: entities.

Mandatory: no
Values: true or false
nerMetadataProperties Service control Properties to be returned for the entities.
Service: entities.

Mandatory: no
Values: Comma separated list. Depends on what properties were created by the user beforehand, e.g. 'description' (if entities contain a description) or 'typetree' (if information on hierarchical relations between entities are stored). null passes all existing properties.
nerAnnotations Service control Additional information on entities, e.g. pages on Wikipedia and aliases.
Service: entities.

Mandatory: no
Values: Comma separated list containing one or several of the annotation layers (Currently available: 'aliases' and 'wikipedia').

Example Request

Example of a POST request where the document is passed directly as text::

curl "https://api.txtwerk.de/rest/txt/analyzer" \
    -H "X-Api-Key: ..." \
    -d text='TXTWerk ist die Textmining-API der Neofonie GmbH, ein in Berlin ansässiger Fullservice-Provider. Neben Entitäten und Schlagwörtern erkennt TXTWerk in Texten unter anderem auch Datumsangaben (z.B. 08.09.2023) und Maßzahlen (z.B. 24h) und ordnet jeden Text einer passenden Textklasse zu.' \
    -d services='entities'

Example of a POST request where a HTML file is passed directly as input parameter::

curl "https://api.txtwerk.de/rest/txt/analyzer" \
    -H "X-Api-Key: ..." \
    -F htmlFile='@' \
    -F services='entities'

Example of a POST request where the document is provided as JSON with weights::

curl "https://api.txtwerk.de/rest/txt/analyzer" \
    -H "X-Api-Key: ..." \
    -d services='entities'
    -d document='[{ "text": "Titel", "weight": "2.0" } , { "text": "Teaser", "weight": "1.5" }]'