The High-Performance and Explainable Language Processing Platform

Golem.ai Core is the no-training required artificial intelligence solution for building high-performance, robust, frugal, and unbiased NLP projects.

Build all your use cases with Golem.ai Core

Build your project from A to Z with our versatile NLP platform.

Extract the content of your documents to save reading time and automate their processing.

OCR processing - Content analysis - Information extraction

Extract the content of your documents to save reading time and automate their processing.

OCR processing - Content analysis - Information extraction

Analysts can create network structures to extract knowledge from different sources.

Text extraction - Content analysis - Information linking

Analysts can create network structures to extract knowledge from different sources.

Text extraction - Content analysis - Information linking

Process incoming messages by analyzing the message and its attachments.

Message and attachments analysis - Categorization - Information extraction

Process incoming messages by analyzing the message and its attachments.

Message and attachments analysis - Categorization - Information extraction

Protégez les données de vos utilisateurs et protégez-les des mauvais contenus.

Take advantage of the power of our revolutionary NLU

Explainable, frugal, multilingual, and customizable.

Tokenization

Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.

We confirm thearrival of the cargo ship Louis Bleriot containing operational equipent for hospitals at the port of Le Havre from the port of 香港. A two-hour delay in theunloding operation is expected

Tokenization

Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.

Tokenization

Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.

arrival cargo ship Louis Bleriot operational equipent hospitals Le Havre from 香港 two-hour delay operational unloding

Tokenization

Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.

Dict:Multi

Correction of terms according to the appropriate language and business usage.

arrival cargo ship Louis Bleriot equipment operational hospitals Le Havre from Hong Kong delay two hours operational unloading

Dict:Multi

Correction of terms according to the appropriate language and business usage.

Chunking

Grouping business terms to improve the understanding.

arrival cargo ship

Louis

Bleriot equipment

operational

hospitals Le Havre from

Hong Kong delay two

hours

operational

unloading

Chunking

Grouping business terms to improve the understanding.

Chunking

Grouping business terms to improve the understanding.

arrival .cargo ship

Louis

Bleriot .equipment

operational

hospitalsLe Havre . from

Hong Kong . delay .two

hours

operational

unloading

Chunking

Grouping business terms to improve the understanding.

Named Entity Recognition

Assigning an entity type to each term.

arrivalStatus. cargo shipTransport

LouisNickname

BleriotName. equipmentProduct

operationalCharacteristic
or action

hospitalsSector. Le HavrePlace. fromDescriptor

Hong KongPlace. delayStatus. twoQuantity

hoursTime

operationalCharacteristic
or action

. unloadingAction

Named Entity Recognition

Assigning an entity type to each term.

Entity Linking

Creating links between entities to resolve a textual entity into a unique identifier from a knowledge base.

arrivalStatus. ‘transport’ : ‘cargo ship’,
‘name’ : ‘Louis Bleriot’Transport. ‘product’ : ‘equipment’,
‘characteristic’ : ‘operational’,
‘sector’ : ‘hospitals’Product. Le HavrePlace of arrival. Hong KongDeparture place. delayStatus. ‘number’ : ‘2’,
‘time’ : ‘hours’Time unloadingAction

Entity Linking

Creating links between entities to resolve a textual entity into a unique identifier from a knowledge base.

Dependency Parsing

Completing the understanding of the text by adding each term to an ontology.

statusarrivalStatus. transport > ship > cargo ship > Louis BleriotIMO 9776432Transport. product > medical devicesmedical devicesProduct. France > portLe HavrePlace of arrival. China > portHong KongDeparture place. statusdelayStatus. ‘number’ : ‘2’,
‘time’ : ‘hours’
Time action > action shippingunloadingAction

Dependency Parsing

Completing the understanding of the text by adding each term to an ontology.

Interaction

Linking different terms based on an ontology to form a unit of meaning.

Delivery tracking statusarrivalStatus transport > ship > cargo ship > Louis BleriotIMO 9776432Transport product > medical devicesmedical devicesProduct France > portLe HavrePlace of arrival China > portHong KongDeparture place

Delivery status statusdelayStatus ‘number’ : ‘2’, ‘time’ : ‘hours’Time. action > action shippingunloadingAction

Interaction

Linking different terms based on an ontology to form a unit of meaning.

Text Extraction from
images and documents

Easily transform your documents into usable texts using our Extractor technology.

Several OCRs and extraction libraries available via API.

				
					package main

import (
  "fmt"
  "strings"
  "net/http"
  "io/ioutil"
)

func main() {

  url := "https://extractor.golem.ai/v3/analyse"
  method := "POST"

  payload := strings.NewReader(`{
    "file": "https://www.yourfile.pdf"
}`)

  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, payload)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("Authorization", "Basic XXX")
  req.Header.Add("Content-Type", "application/json")

  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}

				
					<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => '"https://extractor.golem.ai/scan"',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 200,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'POST',
  CURLOPT_POSTFIELDS =>'{
    "file": "https://www.yourfile.pdf",
    "useCache": true,
    "parsers": {
        "document": {
            "extractImages": false,
            "ocr": {
                "name": "tesseract",
                "mode": "auto"
            },
            "PDF": {
                "extractImages": false,
                "ocr": {
                    "name": "ida",
                    "mode": "on"
                }
            }
        },
        "image": {
            "minimumHeight": 500,
            "minimumWidth": 500,
            "ocr": {
                "name": "ida",
                "mode": "off"
            },
            "png": {
                "minimumWidth": 100,
                "ocr": {
                    "name": "ida"
                }
            }
        },
        "spreadsheet": {
            "readVertically": false,
            "unmergeCells": false,
            "splitPerBlock": false,
            "splitPerBlockRowLimit": 10,
            "splitPerBlockColumnLimit": 10,
            "parseHiddenSheets": false
        },
        "email": {
            "extractAttachments": false,
            "ignoredAttachments": [
                "xlsb",
                "eml"
            ],
            "msg": {
                "extractAttachments": true
            }
        }
    }
}',
  CURLOPT_HTTPHEADER => array(
    'Authorization: Basic XXX',
    'Content-Type: application/json'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;

				
					import requests
import json

if __name__ == "__main__":
    URL: str = "https://extractor.golem.ai/scan"

    payload: dict = json.dumps(
        {
            "file": "https://www.yourfile.pdf",
            "parsers": {
                "document": {
                    "extractImages": False,
                    "ocr": {"name": "tesseract", "mode": "auto"},
                    "PDF": {
                        "extractImages": False,
                        "ocr": {"name": "ida", "mode": "on"},
                    },
                },
                "image": {
                    "minimumHeight": 500,
                    "minimumWidth": 500,
                    "ocr": {"name": "ida", "mode": "off"},
                    "png": {"minimumWidth": 100, "ocr": {"name": "ida"}},
                },
                "spreadsheet": {
                    "readVertically": False,
                    "unmergeCells": False,
                    "splitPerBlock": False,
                    "splitPerBlockRowLimit": 10,
                    "splitPerBlockColumnLimit": 10,
                    "parseHiddenSheets": False,
                },
                "email": {
                    "extractAttachments": False,
                    "ignoredAttachments": ["xlsb", "eml"],
                    "msg": {"extractAttachments": True},
                },
            },
        }
    )

    headers: dict = {"Authorization": f"Basic XXX", "Content-Type": "application/json"}

    response: requests.Response = requests.request(
				"POST", URL, headers=headers, data=payload
		)

    print(response.text)

				
					var settings = {
  "url": "https://extractor.golem.ai/v3/analyse",
  "method": "POST",
  "timeout": 0,
  "headers": {
    "Authorization": "Basic XXX",
    "Content-Type": "application/json"
  },
  "data": JSON.stringify({
    "file": "https://www.yourfile.pdf"
  }),
};

$.ajax(settings).done(function (response) {
  console.log(response);
});

Golem.ai protects and respects your data

Our artificial intelligence allows us to respect your data by design.

Security

Golem.ai follows the cryptography recommendations issued by ANSSI.

Privacy

Golem.ai's AI is hosted at Scaleway in France. You remain the exclusive user and owner of your data.

Compliance

An accessible and documented API, available connectors.

The High-Performance and Explainable Language Processing Platform

Build all your use cases with Golem.ai Core

Take advantage of the power of our revolutionary NLU

Text Extraction from
images and documents

Golem.ai protects and respects your data

Security

Privacy

Compliance

Subscribe to our newsletter

PRODUCT

SOLUTIONS

TECHNOLOGY

COMPANY

RESOURCES

© 2025 Golem.ai

Follow us

The High-Performance and Explainable Language Processing Platform

Build all your use cases with Golem.ai Core

Take advantage of the power of our revolutionary NLU

Text Extraction from images and documents

Golem.ai protects and respects your data

Security

Privacy

Compliance

Subscribe to our newsletter

PRODUCT

SOLUTIONS

TECHNOLOGY

COMPANY

RESOURCES

© 2025 Golem.ai

Follow us

Text Extraction from
images and documents