AI Document Multipage OCR Data Extraction API

Artificial Intelligence

#document #extraction #read

ApyHub

Starting from 500 atoms

AI tier

About

This Utility API focuses on AI-based document data extraction for large text-heavy documents in multiple file formats and global languages.

This API can extract print and handwritten text from PDF documents, scanned images, and various document formats, including Microsoft Word, Excel, PowerPoint, and HTML. This API includes features like higher-resolution scanning of document images for better handling of smaller and dense text; paragraph detection; and fillable form management, making it a valuable asset for document management and data extraction applications.

The atoms cost is subjected to change depending on the size of the input file and the provider selected. The list of providers and the atoms cost for each provider is given below:

Provider (requested_service)	Atoms
Azure	500
ApyHub	2000

Note: In order to test the API on API Playground, just click on "Show optional inputs" and enter the Authentication token for the provider before clicking on Send request. The output response structure and the result of the AI utility APIs depend on the service provider and it may vary depending on which service provider is selected.

Select API Endpoints

Input

file

url

API Playground

API Documentation

input file: output json

POST

https://api.apyhub.com/ai/document/extract/read/file

Request example

1
curl --location --request POST 'https://api.apyhub.com/ai/document/extract/read/file' \
2
--header 'apy-token: {{token}}' \
3
--form 'file=@"sample.jpg"'
4
--form 'requested_service="azure"'
5
--form 'azure_key="your-azure-key"'
6
--form 'azure_endpoint="your-azure-endpoint"'

Provider (requested_service)	Atoms
Azure	500
ApyHub	2000

Method: POST

Content Type: multipart/form-data

Request Body

Attribute	Type	Mandatory	Description
file	file	Yes	Provide the source document file.
requested_service	String	Yes	Provide the name of service provider. Supported providers are `azure`, `apyhub`. Defaults to `apyhub`.
azure_key	String	Yes (if `azure` is selected in requested_service)	Input service key provided by azure.
azure_endpoint	String	Yes (if `azure` is selected in requested_service)	Enter the endpoint provided by azure.

Size And Limits

requested_service	Support matrix and limitations
Apyhub	* Supported formats (`jpeg`, `jpg`, `png`, `tiff`, `heif`, `bmp`, `pdf`, `docx`, `xlsx`, `pptx` ,`html`). * Max document size 500 MB. * Max number of pages (Analysis) 2000.
Azure	* Supported formats (`jpeg`, `jpg`, `png`, `tiff`, `heif`, `bmp`, `pdf`, `docx`, `xlsx`, `pptx` ,`html`). * Max document size 500 MB. * Max number of pages (Analysis) 2000.

Sample Response

A successful request returns the extracted document data response in the output parameter specified. If the request fails, the response contains an error code and a message to help determine what went wrong.

HTTP Response Codes

The method may return one of the following HTTP status codes:

Status Code	Description
200	The request was successful.
400	Invalid input - the file is corrupt or the supported inputs are not provided.
401	Required authentication information is either missing or not valid for the resource.
500	If any unexpected error occurs while processing the request.

Authentication

All API requests to ApyHub services need to be authenticated. Currently we support tokens or basic authentication mechanisms. You can generate and view your existing credentials from your workspace settings (on the left side of the navbar) and go to “API Keys".

Points to note:

Credential secrets are generated on the fly and are not stored in plain text, so on generating a credential please save the secrets somewhere safe.
Use the apy-token as the header parameter to pass the token.
Use the Authorization header to send the basic authentication credentials.

Error codes

1
{
2
  "error": {
3
    "code": 105,
4
    "message": "Invalid URL"
5
  }
6
}

To search for a specific error code, enter the code in the search box below. Alternatively, you can click on the button to view a complete list of all error codes.

Search by code

Table of contents

AboutAPI PlaygroundAPI DocumentationAuthenticationError codesRelated Utility APIsRelated Articles

AI Document Multipage OCR Data Extraction API

About

API Playground

API Documentation

Request example

HTTP Response Codes

Authentication

Error codes

101 - Missing parameters

102 - Invalid JSON

103 - Invalid input

104 - Invalid file

105 - Invalid URL

109 - Invalid input format

110 - Server error