API for Extracting Multipage OCR Data from Documents - ApyHub

OCR Document Data Extraction API

ApyHub
ApyHub
verified icon
Starting from 500 atoms

About

The OCR Document Data Extraction API enables you to extract text from large, text-heavy documents across multiple file formats and languages. Using advanced AI-powered OCR technology, this API can process PDFs, scanned images, Microsoft Word, Excel, PowerPoint, HTML, and more, capturing both printed and handwritten text accurately.
This API is ideal for developers building document management systems, data pipelines, or content automation applications. It supports high-resolution scanning for dense or small text, paragraph detection, and fillable form extraction, enabling reliable and efficient document data extraction at scale.
The atoms cost is subjected to change depending on the size of the input file and the provider selected. The list of providers and the atoms cost for each provider is given below:
Provider (requested_service)Atoms
Azure500
ApyHub2000
Try the AI Document Multipage OCR Data Extraction API in the API playground to automate text extraction, streamline document workflows, and integrate accurate OCR processing into your applications with a single API call.
Select API Endpoints
Input

API Playground

API Documentation

input file: output json
POST
https://api.apyhub.com/ai/document/extract/read/file

Request example

1
curl --location --request POST 'https://api.apyhub.com/ai/document/extract/read/file' \
2
--header 'apy-token: {{token}}' \
3
--form 'file=@"sample.jpg"'
4
--form 'requested_service="azure"'
5
--form 'azure_key="your-azure-key"'
6
--form 'azure_endpoint="your-azure-endpoint"'
Provider (requested_service)Atoms
Azure500
ApyHub2000
Method: POST
Content Type: multipart/form-data
Request Body
AttributeTypeMandatoryDescription
filefileYesProvide the source document file.
requested_serviceStringYesProvide the name of service provider. Supported providers are azure, apyhub. Defaults to apyhub.
azure_keyStringYes (if azure is selected in requested_service)Input service key provided by azure.
azure_endpointStringYes (if azure is selected in requested_service)Enter the endpoint provided by azure.
Size And Limits
requested_serviceSupport matrix and limitations
Apyhub* Supported formats (jpeg, jpg, png, tiff, heif, bmp, pdf, docx, xlsx, pptx ,html).
* Max document size 500 MB.
* Max number of pages (Analysis) 2000.
Azure* Supported formats (jpeg, jpg, png, tiff, heif, bmp, pdf, docx, xlsx, pptx ,html).
* Max document size 500 MB.
* Max number of pages (Analysis) 2000.
Sample Response
A successful request returns the extracted document data response in the output parameter specified. If the request fails, the response contains an error code and a message to help determine what went wrong.

HTTP Response Codes

The method may return one of the following HTTP status codes:
Status CodeDescription
200The request was successful.
400Invalid input - the file is corrupt or the supported inputs are not provided.
401Required authentication information is either missing or not valid for the resource.
500If any unexpected error occurs while processing the request.

Authentication

All API requests to ApyHub services need to be authenticated. Currently we support tokens or basic authentication mechanisms. You can generate and view your existing credentials from your workspace settings (on the left side of the navbar) and go to “API Keys".
Points to note:
  • Credential secrets are generated on the fly and are not stored in plain text, so on generating a credential please save the secrets somewhere safe.
  • Use the apy-token as the header parameter to pass the token.
  • Use the Authorization header to send the basic authentication credentials.

Error codes

1
{
2
"error": {
3
"code": 105,
4
"message": "Invalid URL"
5
}
6
}
To search for a specific error code, enter the code in the search box below. Alternatively, you can click on the button to view a complete list of all error codes.
Table of contents
AboutAPI PlaygroundAPI DocumentationAuthenticationError codesRelated Utility APIsRelated Articles