API Documentation - Request
The request can be sent as GET or POST request to the URL
https://api.txtwerk.de/rest/txt/analyzer
. The document and required services are passed as parameters.
Document
The document to be annotated can be passed directly as text.
Alternatively, you can specify the URL of a website to be analyzed. In this case, the site is crawled and the main text content is determined and processed. Foreign elements, such as navigation or teaser texts, will be removed.
Input texts can be weighted if the document is passed as JSON-string. In this case resulting relevances are each multiplied with the corresponding input weight. The response will be structured according to the provided input texts.
Services
The document can be analyzed with different techniques. Please choose from the following services:
entities | Named Entities based on the Wikidata ontology. |
tags | Keywords which appear in the text and describe and summarize its content. |
categories | Assignment of text to categories (By default: "Politics", "Business", "Cars & Technology", "Internet", "Culture", "Sports", "Travel", "Human interest", and "Science"). |
dates | Dates and periods. |
measures | Measurements that occur in the text. |
authors | Authors of an article available as an HTML document. |
fingerprints | Fingerprints of the text for the identification of similar documents (near duplicate detection). |
lexiconEntities | Named Entites based on an optional user lexicon, which can be managed in TXTWerk additionally. |
lexiconTags | Keywords based on a lexicon maintained in TXTWerk. |
nerEntities | Named Entities which are neither based on the Wikidata ontology nor on the user lexicon, but derive from a distilled Flair language model. |
Service Control
For some services there are more parameters are available, which affect the analysis or the result.
Overview of Parameters
Parameter | Area | Description |
---|---|---|
text | Document |
Contains the document to be annotated as text. If you have longer texts, please send the request as POST request and pass the text in the request body.
Mandatory: either text or htmlFile or document Values: Text |
htmlFile | Document |
Contains the document to be annotated as HTML text.
Mandatory: either text or htmlFile or document Values: HTML text file |
document | Document |
Contains the document to be annotated as JSON string.
Mandatory: either text or htmlFile or document Values: JSON string |
title | Document |
Title of the document. By additionally specifying a title, the result can be improved. However, this will only have an influence on the following services: tags.
Mandatory: no Values: Text |
teaser | Document |
Teaser of the document. By additionally specifying a teaser, the result can be improved. However, this will only have an influence on the following services: tags.
Mandatory: no Values: Text |
services | Services |
List of requested services.
Mandatory: yes Values: Comma separated list that contains at least one of the services supported: [entities, tags, categories, dates, measures, authors, fingerprints, lexiconEntities, lexiconTags, nerEntities] |
language | Service control |
Language of the input document. Language-dependent components can be activated specifically by setting this parameter.
Mandatory: no, will then be auto-detected Values: 'en' or 'de' |
ntags | Service control |
Maximum number of keywords (tags) which are requested.
Service: tags. Mandatory: no, Default: 10 Values: non-negative integer |
ncategories | Service control |
Number of categories to be returned.
Service: categories. Mandatory: no Values: non-negative integer |
nentities | Service control |
Number of entities to be returned.
Service: entities. Mandatory: no Values: non-negative integer |
nerMinConfidence | Service control |
Threshold for the entity confidence.
Service: entities. Mandatory: no, Default: 30 Values: non-negative integer |
nerMinRelevance | Service control |
Schwellwert für den Relevanzwert bei den Entitäten.
Service: entities. Mandatory: no Values: non-negative integer |
nerFormat | Service control |
Response format for the entities.
Service: entities. Mandatory: no Values: 'list', 'aggregate' (aggregated list of entities, sorted by relevance), 'candidates' (for each entity a list of all disambiguation candidates) |
nerMetadata | Service control |
Additional metadata for the entities.
Service: entities. Mandatory: no Values: true or false |
nerMetadataProperties | Service control |
Properties to be returned for the entities.
Service: entities. Mandatory: no Values: Comma separated list. Depends on what properties were created by the user beforehand, e.g. 'description' (if entities contain a description) or 'typetree' (if information on hierarchical relations between entities are stored). null passes all existing properties. |
nerAnnotations | Service control |
Additional information on entities, e.g. pages on Wikipedia and aliases.
Service: entities. Mandatory: no Values: Comma separated list containing one or several of the annotation layers (Currently available: 'aliases' and 'wikipedia'). |
Example Request
Example of a POST request where the document is passed directly as text::
curl "https://api.txtwerk.de/rest/txt/analyzer" \ -H "X-Api-Key: ..." \ -d text='TXTWerk ist die Textmining-API der Neofonie GmbH, ein in Berlin ansässiger Fullservice-Provider. Neben Entitäten und Schlagwörtern erkennt TXTWerk in Texten unter anderem auch Datumsangaben (z.B. 08.09.2023) und Maßzahlen (z.B. 24h) und ordnet jeden Text einer passenden Textklasse zu.' \ -d services='entities'
Example of a POST request where a HTML file is passed directly as input parameter::
curl "https://api.txtwerk.de/rest/txt/analyzer" \ -H "X-Api-Key: ..." \ -F htmlFile='@' \ -F services='entities'
Example of a POST request where the document is provided as JSON with weights::
curl "https://api.txtwerk.de/rest/txt/analyzer" \ -H "X-Api-Key: ..." \ -d services='entities' -d document='[{ "text": "Titel", "weight": "2.0" } , { "text": "Teaser", "weight": "1.5" }]'