Validating Content Before Ingestion
This guide will walk you through using the Validation API to check your content file for formatting issues and potential errors before running a full import. This is a crucial step to ensure a smooth and successful ingestion process.
The validation API performs a "dry run" of the import process. It checks the structure and format of your zip file without actually importing any content into the eGain AI Knowledge Hub. This allows you to identify and fix issues beforehand, saving time and preventing failed import jobs.
Prerequisites
-
A valid OAuth 2.0 access token with the
knowledge.contentmgr.manage
scope. - The content zip file that has been uploaded to its datasource (either an AWS S3 bucket or a Shared File Path).
- The necessary credentials and path information for your chosen data source.
- Review the Format Guide to understand the expected file structure.
API Endpoint
To start a validation job, you will make a POST
request to the following endpoint:
https://${API_DOMAIN}/knowledge/contentmgr/v4/import/content/validate
Request Body
The request body is a JSON object that specifies the data source where your content file is located.
Example Payload (S3 Bucket)
{
"dataSource": {
"type": "AWS S3 bucket",
"path": "s3://mybucket/myfolder/content-to-validate.zip",
"region": "us-east-1",
"credentials": {
"accessKeyId": "AKIAIOSFODNN7EXAMPLE",
"secretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}
}
Example Payload (Shared File Path)
{
"dataSource": {
"type": "Shared file path",
"path": "sftp://server.com/content/uploads/content-to-validate.zip",
"credentials": {
"username": "contentuser",
"password": "securepassword"
}
}
}
Using cURL to Start the Validation Job
You can use the following cURL commands to start the validation job depending on the location of your data source. Remember to replace the placeholder values with your actual data.
cURL for S3 Bucket
--header 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"dataSource": {
"type": "AWS S3 bucket",
"path": "s3://your-bucket-name/your-folder/file.zip",
"region": "your-aws-region",
"credentials": {
"accessKeyId": "YOUR_AWS_ACCESS_KEY_ID",
"secretAccessKey": "YOUR_AWS_SECRET_ACCESS_KEY"
}
}
}'
cURL for Shared File Path
curl --location --request POST 'https://<API_DOMAIN>/knowledge/contentmgr/v4/import/content/validate' \
--header 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"dataSource": {
"type": "Shared file path",
"path": "sftp://your-server.com/path/to/file.zip",
"credentials": {
"username": "YOUR_USERNAME",
"password": "YOUR_PASSWORD"
}
}
}'
Successful Response (Job Accepted)
A successful request queues the validation job and returns a 202 Accepted status code. The location header in the response will contain the URL to check the status of the validation job.
Example Response Header:
location: /knowledge/contentmgr/v4/import/content/7A84B875-6F75-4C7B-B137-0632B62DB0BD
Checking the Validation Results
To get the results of the validation, you must poll the status endpoint provided in the location header from the previous step.
Example cURL to Get Status
curl --location --request GET 'https://<API_DOMAIN>/knowledge/contentmgr/v4/import/content/<job_id>/status' \
--header 'Authorization: Bearer <YOUR_ACCESS_TOKEN>'
Replace <job_id> with the ID you received in the location header.
Example Validation Result Payload
Once the job is complete, the status response will show the results. Pay close attention to the jobType, status, and results fields. The logFileLocation will contain a detailed report of any warnings or errors.
{
"status": "Successful",
"jobType": "Validation",
"progress": {
"processed": 5000,
"total": 5000,
"percentage": 100
},
"logFileLocation": "s3://mybucket/logs/import-logs-7A84B875-6F75-4C7B-B137-0632B62DB0BD.txt",
"startTime": "2024-03-01T10:00:00.000Z",
"completionTime": "2024-03-01T11:15:00.000Z",
"results": {
"successful": 4985,
"warnings": 10,
"errors": 5
}
}
If the results object shows any errors, you must correct them in your source content and re-run the validation process before attempting a final import.