Login

en | de

API Documentation - Response

The response is always given in JSON format. It contains the analyzed text, the language of the text, and a response block for every service requested with its analysis result. The structure of all service responses is described in detail further below.

For an example of a complete response, see section Overview.

Basic Response Format

  • {
    • text: "TXTWerk ist die Textmining-API der Neofonie GmbH, ein in Berlin ansässiger Fullservice-Provider. Neben Entitäten und Schlagwörtern erkennt TXTWerk in Texten unter anderem auch Datumsangaben (z.B. 08.09.2023) und Maßzahlen (z.B. 24h) und ordnet jeden Text einer passenden Textklasse zu.",
    • timestamp: 1400247994051,
    • language: "de",
    • entities: [
    •    
    • ]
    •    
    • lexiconEntities:
    •    [
    •    
    • ]
    •    
    • nerEntities:
    •    [
    •    
    • ]
    •    
    • lexiconTags:
    •    [
    •    
    • ]
    •    
    • tags:
    •    [
    •    
    • ]
    •    
    • dates:
    •    [
    •    
    • ]
    •    
    • categories:
    •    [
    •    
    • ]
    •    
    • measures:
    •    [
    •    
    • ]
    •    
    • fingerprints:
    •    [
    •    
    • ]
    •    
    • legals:
    •    [
    •    
    • ]
    •    
  • }

If a service has analyzed the text successfully, but did not find any results, an empty result list will be returned. In case of an error of a single service, the returned HTTP status will be 200 and the response content will contain the results of all the services, except for the failed service block. Services which were not included in the request, are generally also not included in the response.

Description of each field:

text The analyzed text. If you passed an URL, the extracted plain text (with boiler plate removal) will be displayed. If you passed a plain text within the parameter 'text', the text will be returned unchanged. If you passed a json document, all single paragraphs will be concatenated in the response.
language The language of the text, e.g. "de" , "en", or "ru".
timestamp The timestamp of the response (in milliseconds since January 1, 1970).

Response Format: Entities

  • {
    • entities: [
      • {
        • confidence: 36.218177795410156,
        • relevance: 25.53207015991211,
        • surface: "GmbH",
        • label: "Gesellschaft mit beschränkter Haftung",
        • uri: "https://www.wikidata.org/wiki/Q460178",
        • type: "CONCEPT",
        • start: 44,
        • end: 48
      • },
      • {
        • confidence: 39.26929473876953,
        • relevance: 11.950702667236328,
        • surface: "Berlin",
        • label: "Berlin",
        • uri: "https://www.wikidata.org/wiki/Q64",
        • type: "PLACE",
        • start: 57,
        • end: 63
      • },
      • {
        • confidence: 95.73828125,
        • relevance: 35.542537689208984,
        • surface: "Texten",
        • label: "Text",
        • uri: "https://www.wikidata.org/wiki/Q234460",
        • type: "CONCEPT",
        • start: 150,
        • end: 156
      • }
    • ]
  • }

Description of each field:

label The unique label of the entity.
surface The surface form of the entity in the text.
type Type of entity. Possible values ​​are "PERSON", "PLACE", "ORGANISATION", "JOB TITLE", "WORK", "EVENT" and "CONCEPT". This is determined heuristically and may differ in some cases from the expected value. For example: A city can act as an employer and may therefore be classified as an organization.
uri The Wikidata URI of the named entity. Set to 'null' if there is no entity URI in the Wikidata knowledge base.
confidence Confidence value of the discovered entity. A higher value represents a more secure detection. The upper value of the confidence is unlimited.
relevance Relevance value for the discovered entity. A higher value represents a more important entity in the given context. The upper value of the relevance is unlimited.
start The start position of the entity in the text.
end The end position of the entity in the text.

Dependent, on which optional parameters were requested, the following fields may be included in the response::

annotations Additional annotation information about the entities. Only if the request contains the additional parameter ner Annotations .
annotations.aliases Also-known-as for the entity. Not included in the response block, if there are no aliases in Wikidata.
annotations.wikipedia Link to a suitable Wikipedia page. If no page is linked in Wikidata, this field is not included in the response.
candidates All entities, which were disambiguation candidates for the entity concerned. Only existant if the request parameter nerFormat is set to 'candidates'.
candidates.uri The Wikidata URI of the disambiguation candidate.
candidates.type Type of disambiguation candidate. Possible values are "PERSON", "PLACE", "ORGANISATION", "JOBTITLE", "WORK", "EVENT" and "CONCEPT".This is determined heuristically and may differ in some cases from the expected value. For example: A city can act as an employer and may therefore be classified as an organization.
candidates.confidence Confidence value of the disambiguation candidate. A higher value represents a more secure detection. The upper value of the confidence is unlimited.
candidates.label The unique label of the disambiguation candidate.
userDefinedFields Additional information on the entities, user-dependent. Only existant if the two request parameters nerMetadata and nerMetadataProperties are set. The response contains keys and values, which depend on what additional fields have been defined by the user.

Response Format: Top Entities

Top entities are included in the response if the request parameter nerFormat was set to 'aggregate'. Entities with the highest relevance values represent top entities.

  • {
    • topEntities: [
      • {
        • confidence: 95.73828125,
        • relevance: 35.542537689208984,
        • label: "Text",
        • uri: "https://www.wikidata.org/wiki/Q234460",
        • type: "CONCEPT",
        • matches: [
          • {
            • surface: "Texten",
            • start: 150,
            • end: 156
          • }
        • ]
      • },
      • {
        • confidence: 100.0,
        • relevance: 33.77956008911133,
        • label: "Neofonie GmbH",
        • uri: "Neofonie",
        • type: "Organisation",
        • userDefinedFields:
          • {
          • }
        • matches: [
          • {
            • surface: "Neofonie GmbH",
            • start: 35,
            • end: 48
          • }
        • ]
      • },
      • {
        • confidence: 36.218177795410156,
        • relevance: 25.53207015991211,
        • label: "Gesellschaft mit beschränkter Haftung",
        • uri: "https://www.wikidata.org/wiki/Q460178",
        • type: "CONCEPT",
        • matches: [
          • {
            • surface: "GmbH",
            • start: 44,
            • end: 48
          • }
        • ]
      • },
      • {
        • confidence: 39.26929473876953,
        • relevance: 11.950702667236328,
        • label: "Berlin",
        • uri: "https://www.wikidata.org/wiki/Q64",
        • type: "PLACE",
        • matches: [
          • {
            • surface: "Berlin",
            • start: 57,
            • end: 63
          • }
        • ]
      • }
    • ]
  • }

Description of each field:

label The unique label of the entity.
type Type of entity. Possible values ​​are "PERSON", "PLACE", "ORGANISATION", "JOB TITLE", "WORK", "EVENT" and "CONCEPT". This is determined heuristically and may differ in some cases from the expected value. For example: A city can act as an employer and may therefore be classified as an organization.
uri The Wikidata URI of the named entity. Set to 'null' if there is no entity URI in the Wikidata knowledge base.
confidence Confidence value of the discovered entity. A higher value represents a more secure detection. The upper value of the confidence is unlimited.
relevance Relevance value for the discovered entity. A higher value represents a more important entity in the given context. The upper value of the relevance is unlimited.
matches Matches of the entity in the text.
matches.surface The surface form of the entity match in the text.
matches.start The start position of the entity match in the text.
matches.end Die Endposition der Fundstelle im Text.

Response Format: Lexicon Entities

These Named Entities are based on a lexicon managed in TXTWerk. Contrary to the Wikidata entities, they are determined without any disambiguation. The response format is the same as for 'entities' except for the different response block name, which is 'lexiconEntities'.

Description of each field:

label The unique label of the entity.
surface The surface form of the entity in the text.
type Type of entity. Possible values ​​are managed in the lexicon and depend on its state.
uri A URI associated with this named entity, typically an identifier in an external system.
relevance Relevance value for the discovered entity. A higher value represents a more important entity in the given context. The upper value of the relevance is unlimited.
confidence Confidence value of the discovered entity, which, in this case however, is 1 at all times since the service is based on the user lexicon.
start The start position of the entity in the text.
end The end position of the entity in the text.
userDefinedFields Zusätzliche Informationen zu den Entitäten, abhängig vom User.

Response Format: NER Entities

  • {
    • nerEntities: [
      • {
        • type: "ORGANISATION",
        • confidence: 0.6897694170475006,
        • start: 35,
        • end: 48,
        • surface: "Neofonie GmbH"
      • },
      • {
        • type: "PLACE",
        • confidence: 0.9957075119018555,
        • start: 57,
        • end: 63,
        • surface: "Berlin"
      • }
    • ]
  • }

Description of each field:

surface The surface form of the entity in the text.
type Type of entity. Possible values are "PERSON" and "PLACE".
confidence Confidence value of the discovered entity. A higher value represents a more secure detection. The upper value of the confidence is unlimited.
start The start position of the entity in the text.
end The end position of the entity in the text.

Response Format: Tags

  • {
    • tags: [
      • {
        • confidence: 0.9989658313414402,
        • term: "TXTWerk"
      • },
      • {
        • confidence: 0.9782419755349671,
        • term: "Entitäten"
      • },
      • {
        • confidence: 0.9732933133596776,
        • term: "Textmining-API"
      • },
      • {
        • confidence: 0.9365462323616698,
        • term: "Neofonie GmbH"
      • },
      • {
        • confidence: 0.8993179739843555,
        • term: "Schlagwörter"
      • },
      • {
        • confidence: 0.8814831569459867,
        • term: "Berlin"
      • },
      • {
        • confidence: 0.874798029178814,
        • term: "Fullservice-Provider"
      • }
    • ]
  • }

Description of each field:

term The keyword found.
confidence Confidence value of the phrase. It is always between 0 to 1.

Response Format: Lexicon Tags

  • {
    • text: "TXTWerk ist die Textmining-API der Neofonie GmbH, ein in Berlin ansässiger Fullservice-Provider. Neben Entitäten und Schlagwörtern erkennt TXTWerk in Texten unter anderem auch Datumsangaben (z.B. 08.09.2023) und Maßzahlen (z.B. 24h) und ordnet jeden Text einer passenden Textklasse zu.",
    • lexiconTags: [
      • {
        • id: "[unique id]",
        • tag: "ansässig",
        • score: 7.6243725,
        • analyzed: "ansässig",
        • observedSurfaces: [
          • {
            • start: 64,
            • end: 74,
            • type: "TAG",
            • observedSurface: "ansässiger",
            • analyzed: "ansässig"
          • }
        • ]
      • }
    • ]
  • }

Description of each field:

id Unique ID of the tag in the user lexicon.
tag Unique label of the tag
score Value representing the quality of the match. Determined by the matching algorithm.
analyzed Tags are converted into different (synonym) word forms algorithmically. Here the word form (of the tag), that has matched, is listed.
observedSurfaces Matches of the tag in the text.
observedSurfaces.start The start position of the tag match in the text.
observedSurfaces.end The end position of the tag match in the text.
observedSurfaces.type Type of match. Possible values are "TAG", "SYNONYM" and "GENDER".
observedSurfaces.observedSurface The surface form of the match in the text.
observedSurfaces.analyzed All tokens of the text are converted into different (synonym) word forms algorithmically. Here the word form (of the token), that has matched, is listed.

Response Format: Dates

  • {
    • dates: [
      • {
        • surface: "08.09.2023",
        • start: 196,
        • end: 206,
        • dateStart:
          • {
            • day: 8,
            • month: 9,
            • year: 2023,
            • bc: false
          • }
        • dateEnd:
          • {
            • day: 8,
            • month: 9,
            • year: 2023,
            • bc: false
          • }
      • }
    • ]
  • }

Description of each field:

surface The surface form of the date in the text.
start The start position of the date in the text.
end The end position of the date in the text.
dateStart The start date. A date is always represented as time periods, i.e. start and end date may have the same value.
dateEnd The end date.
day The day of the start or end date. Possible values ​​are 1-31.
month The month of the start or end date. Possible values ​​are 1-12.
year The year of the start or end date.
bc Describes whether the date refers to the time before Christ. Possible values ​​are true and false.

Response Format: Categories

  • {
    • categories: [
      • {
        • confidence: 0.9999914614615732,
        • label: "internet"
      • },
      • {
        • confidence: 8.5340630740002E-6,
        • label: "kultur"
      • },
      • {
        • confidence: 3.4390082461387908E-9,
        • label: "auto+technik"
      • },
      • {
        • confidence: 7.942384268635301E-10,
        • label: "wirtschaft"
      • },
      • {
        • confidence: 1.1799574174439144E-10,
        • label: "reisen"
      • },
      • {
        • confidence: 8.06441429999464E-11,
        • label: "wissenschaft"
      • },
      • {
        • confidence: 4.031349737157026E-11,
        • label: "politik"
      • },
      • {
        • confidence: 3.152736753221788E-12,
        • label: "sport"
      • }
    • ]
  • }

Description of each field:

label The name of the category. Possible values ​​are "Politics", "Business", "Car & Technology", "Internet", "Culture", "Travel", "Sports", "Human interest", and "Science".
confidence Confidence value of the category. Always between 0 and 1.

Response Format: Measures

  • {
    • measures: [
      • {
        • start: 228,
        • end: 231,
        • text: "24h",
        • valueString: "24",
        • unitString: "h",
        • type: "TIME",
        • alias: [
        •    "24 h",
        •    "24h",
        •    "24Stunde",
        •    "24 Stunde",
        •    "24 Stunden",
        •    "24Stunden",
        • ]
      • }
    • ]
  • }

Description of each field:

start The start position of the measurement in the text.
end The end position of the measurement in the text.
text The measurement string, exactly as it occurs in the text.
valueString The value of the measurement as a string, exactly as it occurs in the text.
unitString The unit as a string, exactly as it occurs in the text.
unitCanonical Nur bei Währungen. Unabhängig vom konkreten String der Einheit im Text handelt es sich hier um den Drei-Buchstaben-Code der jeweiligen Währung.
type The type of measurement. Possible values are "LENGTH", "AREA", "MASS", "TEMPERATURE", "VOLTAGE", "AMPERAGE", "RESISTANCE", "CHARGE", "CAPACITY", "CONDUCTANCE", "INDUCTANCE", "MAGNETIC_STRENGTH", "POWER", "ENERGY", "FORCE", "PRESSURE", "FREQUENCY", "VOLUME", "LUMINOSITY", "ILLUMINANCE", "SPIN", "SUBSTANCE", "RADIOACTIVITY", "CURRENCY", "TIME", "UNKNOWN"
alias Further variants of the measurement string (with and without space, units with and without abbreviation, conversions).

Antwortformat: Fingerprints

  • {
    • fingerprints: [
    •    7493129,
    •    18632078,
    •    48467713,
    •    64740551,
    •    61803666,
    •    57602,
    •    20683602,
    •    7169662,
    •    124073776,
    •    1324512,
    •    48689911,
    •    63618400,
    •    82739683,
    •    57114900,
    •    86498997,
    •    5531749,
    •    43615458,
    •    63266708,
    •    35312651,
    •    1767346,
    •    166345084,
    •    20994017,
    •    10618634,
    •    35187378,
    •    52012568,
    •    62221932,
    •    101283997,
    •    194238108,
    •    24943142,
    •    48857582,
    •    214343186,
    •    8807040,
    •    11737208,
    •    29004557,
    •    33563369,
    •    23510317,
    •    54409541,
    •    58494605,
    •    55886581,
    •    88208507,
    •    10609552,
    •    7042020,
    •    21855281,
    •    9560326,
    •    22894461,
    •    19569052,
    •    11695122,
    •    59192088,
    •    11647472,
    •    25992587,
    • ]
  • }