AI Document Multipage OCR Data Extraction API
ApyHub
Starting from 500 atoms
AI tier
About
This Utility API focuses on AI-based document data extraction for large text-heavy documents in multiple file formats and global languages.
This API can extract print and handwritten text from PDF documents, scanned images, and various document formats, including Microsoft Word, Excel, PowerPoint, and HTML. This API includes features like higher-resolution scanning of document images for better handling of smaller and dense text; paragraph detection; and fillable form management, making it a valuable asset for document management and data extraction applications.
The atoms cost is subjected to change depending on the size of the input file and the provider selected. The list of providers and the atoms cost for each provider is given below:
Provider (requested_service) | Atoms |
---|---|
Azure | 500 |
ApyHub | 2000 |
Note: In order to test the API on API Playground, just click on "Show optional inputs" and enter the Authentication token for the provider before clicking on Send request. The output response structure and the result of the AI utility APIs depend on the service provider and it may vary depending on which service provider is selected.
Select API Endpoints
Input
API Playground
API Documentation
input file: output json
POST
https://api.apyhub.com/ai/document/extract/read/file
Request example
Provider (requested_service) | Atoms |
---|---|
Azure | 500 |
ApyHub | 2000 |
Method:
POST
Content Type:
multipart/form-data
Request Body
Attribute | Type | Mandatory | Description |
---|---|---|---|
file | file | Yes | the source document file. |
requested_service | String | yes | the name of service provider. Supported providers are azure , apyhub . Defaults to apyhub |
azure_key | String | yes (if azure is selected in requested_service) | service key provided by azure |
azure_endpoint | String | yes (if azure is selected in requested_service) | the endpoint provided by azure |
Size And Limits
requested_service | Support matrix and limitations |
---|---|
Apyhub | * Supported formats (jpeg , jpg , png , tiff , heif , bmp , pdf , docx , xlsx , pptx ,html ). * Max document size 500 MB. * Max number of pages (Analysis) 2000. |
Azure | * Supported formats (jpeg , jpg , png , tiff , heif , bmp , pdf , docx , xlsx , pptx ,html ). * Max document size 500 MB. * Max number of pages (Analysis) 2000. |
Sample Response
A successful request returns the extracted document data response in the output parameter specified. If the request fails, the response contains an error code and a message to help determine what went wrong.
HTTP Response Codes
The method may return one of the following HTTP status codes:
Status Code | Description |
---|---|
200 | The request was successful. |
400 | Invalid input - the file is corrupt or the supported inputs are not provided. |
401 | Required authentication information is either missing or not valid for the resource. |
500 | If any unexpected error occurs while processing the request. |
Authentication
All API requests to ApyHub services need to be authenticated. Currently we support
tokens
or basic authentication
mechanisms.
You can generate and view your existing credentials from your workspace settings (on the left side of the navbar) and go to “API Keys".Points to note:
- Credential secrets are generated on the fly and are not stored in plain text, so on generating a credential please save the secrets somewhere safe.
- Use the
apy-token
as the header parameter to pass the token. - Use the
Authorization
header to send the basic authentication credentials.
Error codes
To search for a specific error code, enter the code in the search box below. Alternatively, you can click on the button to view a complete list of all error codes.
Table of contents