Extract Text from PDF API
ApyHub
50 atoms
Base tier
About
This Utility API lets you extract the content of any PDF file. Our Extract Text from PDF API is your go to API tool that can extract all text from PDF documents, optionally including style, and preserving paragraphs.
Extracting text from a PDF file can make the text searchable, which can be particularly useful if you need to quickly find specific information within the file. This can make the content more accessible for people with disabilities, such as those with visual impairments. By extracting the text, you can convert it into a format that is easier to read using assistive technologies such as screen readers.
Try out the text extractor API in the API playground and see how this free online tool can become your PDF text extractor, helping you save time and reduce manual text extraction and exporting through a simple API call.
Select API Endpoints
Input
API Playground
API Documentation
upload file: extracted data
POST
https://api.apyhub.com/extract/text/pdf-file
Request example
The method lets you submit a
pdf
file and returns the extracted text as string output. This is the most straight forward way to use this service - submit a pdf file and receive the extracted text as a response.Method:
POST
Content Type:
multipart/form-data
Request Body
Attribute | Type | Mandatory | Description |
---|---|---|---|
file | File | Yes | The source pdf file. |
preserve_paragraphs | Boolean | No | This preserves the paragraphs in the response, if true , defaults to false . |
start_page | Integer | No | The starting page number for text extraction. Default is 1 , can range from 1 to the last page number. For example, to start from page 2, set start_page to 2 . |
end_page | Integer | No | The ending page number for text extraction. Default is the last page number, can range from 1 to the last page number. For example, to end at page 5, set end_page to 5 . |
starting_x_coordinate | Integer | No | Distance from the left edge (x-coordinate) to start extraction. Can range from 0 to 100 , default is 0 . For example, to start extraction 20% from the left, set starting_x_coordinate to 20 . |
starting_y_coordinate | Integer | No | Distance from the top edge (y-coordinate) to start extraction. Can range from 0 to 100 , default is 0 . For example, to start extraction 20% below the top edge, set starting_y_coordinate to 20 . |
ending_x_coordinate | Integer | No | Defines the width of the extraction area, starting from the starting_x_coordinate. Must be greater than starting_x_coordinate. Can range from 0 to 100 , with a default of 100 . For example, set ending_x_coordinate to 50 to extract text up to 50% of the page width from the left edge. |
ending_y_coordinate | Integer | No | Defines the height of the extraction area, starting from the starting_y_coordinate. Must be greater than starting_y_coordinate. Can range from 0 to 100 , with a default of 100 . For example, set ending_y_coordinate to 50 to extract text up to 50% of the page height from the top edge. |
Sample Response
HTTP Response Codes
The method may return one of the following HTTP status codes:
Status Code | Description |
---|---|
200 | The request was successful. |
400 | Request is invalid or the file is not accessible. |
401 | Required authentication information is either missing or not valid for the resource. |
500 | There was an error in processing this request. |
Authentication
All API requests to ApyHub services need to be authenticated. Currently we support
tokens
or basic authentication
mechanisms.
You can generate and view your existing credentials from your workspace settings (on the left side of the navbar) and go to “API Keys".Points to note:
- Credential secrets are generated on the fly and are not stored in plain text, so on generating a credential please save the secrets somewhere safe.
- Use the
apy-token
as the header parameter to pass the token. - Use the
Authorization
header to send the basic authentication credentials.
Error codes
To search for a specific error code, enter the code in the search box below. Alternatively, you can click on the button to view a complete list of all error codes.
Table of contents