The High-Performance and Explainable Language Processing Platform

Golem.ai Core is the no-training required artificial intelligence solution for building high-performance, robust, frugal, and unbiased NLP projects.

Build all your use cases with Golem.ai Core

Build your project from A to Z with our versatile NLP platform.

Extract the content of your documents to save reading time and automate their processing.
OCRisation - Analyse du contenu - Extraction d'information
Les analystes peuvent créer des structures de réseau pour extraire de la connaissance provenant de différents textes.
Extraction de texte - Analyse du contenu - Mise en relation des informations
Process incoming messages by analyzing the message and its attachments.
Analyse du message et des PJ - Catégorisation - Extraction d'information
Protégez les données de vos utilisateurs et protégez-les des mauvais contenus.

Take advantage of the power of our revolutionary NLU

Explainable, frugal, multilingual, and customizable.

We confirm the arrival of the cargo ship Louis Blériot containing operational equipment for hospitals at the port of Le Havre from the port of 香港. A two-hour delay in the unloading operation is expected.

Tokenization
Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.
Nous confirmons l’arrivée of cargo Louis Blériot contenant le matriel dopérations pour leshôpitaux to port of Havre depuis le port from 香港, un retard from deux heures sur lopération from déchargment est prévu
Tokenization
Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.
Tokenization
Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.
arrivée cargo Louis Blériot matriel opérations hôpitaux Havre depuis 香港 retard deux heures opération déchargment
Tokenization
Selection and separation of words (tokens) to keep only the relevant elements. Tokenization is enriched by the configuration, which is used to pre-select the relevant tokens.
Dict:Multi
Correction of terms according to the appropriate language and business usage.
arrivée cargo Louis Blériot matériel opérations hôpitaux Havre depuis Hong Kong retard deux heures opération déchargement
Dict:Multi
Correction of terms according to the appropriate language and business usage.
Chunking
Grouping business terms to improve the understanding.
arrivée cargo
Louis
Blériot matériel
opérations
hôpitaux Havre depuis
Hong Kong retard deux
heures
opération
déchargement
Chunking
Grouping business terms to improve the understanding.
Chunking
Grouping business terms to improve the understanding.
arrivée .cargo
Louis
Blériot .matériel
opérations
hôpitauxHavre . depuis
Hong Kong . retard .deux
heures
opération
déchargement
Chunking
Grouping business terms to improve the understanding.
Named Entity Recognition
Assigning an entity type to each term.
arrivéeStatut. cargoTransport
LouisNickname
BlériotNom. matérielProduct
opérationsCaractéristique
ou action

hôpitauxSecteur. HavreLieu. depuisDescriptor
Hong KongLieu. retardStatut. deuxQuantité
heuresTemps
opérationCaractéristique
ou action

. déchargementAction
Named Entity Recognition
Assigning an entity type to each term.
Entity Linking
Creating links between entities to resolve a textual entity into a unique identifier from a knowledge base.
arrivéeStatut. ‘transport’ : ‘cargo’,
‘nom’ : ‘Louis Blériot’
Transport
. ‘produit’ : ‘matériel’,
‘caractéristique’ : ‘opération’,
‘secteur’ : ‘hôpitaux’
Product
. HavreLieu arrivée. Hong KongLieu départ. retardStatut. ‘nombre’ : ‘2’,
‘temps’ : ‘heures’
Temps
déchargementAction
Entity Linking
Creating links between entities to resolve a textual entity into a unique identifier from a knowledge base.
Dependency Parsing
Completing the understanding of the text by adding each term to an ontology.
statutarrivéeStatut. transport > bateau > cargo > Louis BlériotIMO 9776432Transport. produit > dispositifs médicauxmatériel médicalProduct. France > portHavreLieu arrivée. Chine > portHong KongLieu départ. statutretardStatut. ‘nombre’ : ‘2’,
‘temps’ : ‘heures’
Temps
action > action shippingdéchargementAction
Dependency Parsing
Completing the understanding of the text by adding each term to an ontology.
Interaction
Linking different terms based on an ontology to form a unit of meaning.
Suivi livraison statutarrivéeStatut transport > bateau > cargo > Louis BlériotIMO 9776432Transport produit > dispositifs médicauxmatériel médicalProduct France > portHavreLieu arrivée Chine > portHong KongLieu départ
Statut livraison statutretardStatut ‘nombre’ : ‘2’, ‘temps’ : ‘heures’Temps. action > action shippingdéchargementAction
Interaction
Linking different terms based on an ontology to form a unit of meaning.

Text Extraction from
images and documents

Easily transform your documents into usable texts using our Extractor technology.

Several OCRs and extraction libraries available via API.

				
					package main

import (
  "fmt"
  "strings"
  "net/http"
  "io/ioutil"
)

func main() {

  url := "https://extractor.golem.ai/v3/analyse"
  method := "POST"

  payload := strings.NewReader(`{
    "file": "https://www.yourfile.pdf"
}`)

  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, payload)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("Authorization", "Basic XXX")
  req.Header.Add("Content-Type", "application/json")

  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}
				
			
				
					<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => '"https://extractor.golem.ai/scan"',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 200,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'POST',
  CURLOPT_POSTFIELDS =>'{
    "file": "https://www.yourfile.pdf",
    "useCache": true,
    "parsers": {
        "document": {
            "extractImages": false,
            "ocr": {
                "name": "tesseract",
                "mode": "auto"
            },
            "PDF": {
                "extractImages": false,
                "ocr": {
                    "name": "ida",
                    "mode": "on"
                }
            }
        },
        "image": {
            "minimumHeight": 500,
            "minimumWidth": 500,
            "ocr": {
                "name": "ida",
                "mode": "off"
            },
            "png": {
                "minimumWidth": 100,
                "ocr": {
                    "name": "ida"
                }
            }
        },
        "spreadsheet": {
            "readVertically": false,
            "unmergeCells": false,
            "splitPerBlock": false,
            "splitPerBlockRowLimit": 10,
            "splitPerBlockColumnLimit": 10,
            "parseHiddenSheets": false
        },
        "email": {
            "extractAttachments": false,
            "ignoredAttachments": [
                "xlsb",
                "eml"
            ],
            "msg": {
                "extractAttachments": true
            }
        }
    }
}',
  CURLOPT_HTTPHEADER => array(
    'Authorization: Basic XXX',
    'Content-Type: application/json'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;
				
			
				
					import requests
import json

if __name__ == "__main__":
    URL: str = "https://extractor.golem.ai/scan"

    payload: dict = json.dumps(
        {
            "file": "https://www.yourfile.pdf",
            "parsers": {
                "document": {
                    "extractImages": False,
                    "ocr": {"name": "tesseract", "mode": "auto"},
                    "PDF": {
                        "extractImages": False,
                        "ocr": {"name": "ida", "mode": "on"},
                    },
                },
                "image": {
                    "minimumHeight": 500,
                    "minimumWidth": 500,
                    "ocr": {"name": "ida", "mode": "off"},
                    "png": {"minimumWidth": 100, "ocr": {"name": "ida"}},
                },
                "spreadsheet": {
                    "readVertically": False,
                    "unmergeCells": False,
                    "splitPerBlock": False,
                    "splitPerBlockRowLimit": 10,
                    "splitPerBlockColumnLimit": 10,
                    "parseHiddenSheets": False,
                },
                "email": {
                    "extractAttachments": False,
                    "ignoredAttachments": ["xlsb", "eml"],
                    "msg": {"extractAttachments": True},
                },
            },
        }
    )

    headers: dict = {"Authorization": f"Basic XXX", "Content-Type": "application/json"}

    response: requests.Response = requests.request(
				"POST", URL, headers=headers, data=payload
		)

    print(response.text)
				
			
				
					var settings = {
  "url": "https://extractor.golem.ai/v3/analyse",
  "method": "POST",
  "timeout": 0,
  "headers": {
    "Authorization": "Basic XXX",
    "Content-Type": "application/json"
  },
  "data": JSON.stringify({
    "file": "https://www.yourfile.pdf"
  }),
};

$.ajax(settings).done(function (response) {
  console.log(response);
});
				
			

Golem.ai protects and respects your data

Our artificial intelligence allows us to respect your data by design.

Security

Golem.ai follows the cryptography recommendations issued by ANSSI.

Privacy

Golem.ai's AI is hosted at Scaleway in France. You remain the exclusive user and owner of your data.

Compliance

An accessible and documented API, available connectors.

Join the Golem.ai community

Do you have an NLP project? Try our Core technology by signing up for the waiting list.