Validating Content Before Ingestion

This guide will walk you through using the Validation API to check your content file for formatting issues and potential errors before running a full import. This is a crucial step to ensure a smooth and successful ingestion process.

The validation API performs a "dry run" of the import process. It checks the structure and format of your zip file without actually importing any content into the eGain AI Knowledge Hub. This allows you to identify and fix issues beforehand, saving time and preventing failed import jobs.

Prerequisites

A valid OAuth 2.0 access token with the knowledge.contentmgr.manage scope.
The content zip file that has been uploaded to its datasource (either an AWS S3 bucket or a Shared File Path).
The necessary credentials and path information for your chosen data source.
Review the Format Guide to understand the expected file structure.

API Endpoint

To start a validation job, you will make a POST request to the following endpoint:

https://${API_DOMAIN}/knowledge/contentmgr/v4/import/content/validate

Request Body

The request body is a JSON object that specifies the data source where your content file is located.

Example Payload (S3 Bucket)

Copy

Copied

{
  "dataSource": {
    "type": "AWS S3 bucket",
    "path": "s3://mybucket/myfolder/content-to-validate.zip",
    "region": "us-east-1",
    "credentials": {
      "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
      "secretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
    }
  }
}

Example Payload (Shared File Path)

Copy

Copied

{
  "dataSource": {
    "type": "Shared file path",
    "path": "sftp://server.com/content/uploads/content-to-validate.zip",
    "credentials": {
      "username": "contentuser",
      "password": "securepassword"
    }
  }
}

Using cURL to Start the Validation Job

You can use the following cURL commands to start the validation job depending on the location of your data source. Remember to replace the placeholder values with your actual data.

cURL for S3 Bucket

Copy

Copied

--header 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "dataSource": {
        "type": "AWS S3 bucket",
        "path": "s3://your-bucket-name/your-folder/file.zip",
        "region": "your-aws-region",
        "credentials": {
            "accessKeyId": "YOUR_AWS_ACCESS_KEY_ID",
            "secretAccessKey": "YOUR_AWS_SECRET_ACCESS_KEY"
        }
    }
}'

cURL for Shared File Path

Copy

Copied

curl --location --request POST 'https://<API_DOMAIN>/knowledge/contentmgr/v4/import/content/validate' \
--header 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "dataSource": {
        "type": "Shared file path",
        "path": "sftp://your-server.com/path/to/file.zip",
        "credentials": {
            "username": "YOUR_USERNAME",
            "password": "YOUR_PASSWORD"
        }
    }
}'

Successful Response (Job Accepted)

A successful request queues the validation job and returns a 202 Accepted status code. The location header in the response will contain the URL to check the status of the validation job.

Example Response Header:

Copy

Copied

location: /knowledge/contentmgr/v4/import/content/7A84B875-6F75-4C7B-B137-0632B62DB0BD

Checking the Validation Results

To get the results of the validation, you must poll the status endpoint provided in the location header from the previous step.

Example cURL to Get Status

Copy

Copied

curl --location --request GET 'https://<API_DOMAIN>/knowledge/contentmgr/v4/import/content/<job_id>/status' \
--header 'Authorization: Bearer <YOUR_ACCESS_TOKEN>'

Replace <job_id> with the ID you received in the location header.

Example Validation Result Payload

Once the job is complete, the status response will show the results. Pay close attention to the jobType, status, and results fields. The logFileLocation will contain a detailed report of any warnings or errors.

Copy

Copied

{
    "status": "Successful",
    "jobType": "Validation",
    "progress": {
        "processed": 5000,
        "total": 5000,
        "percentage": 100
    },
    "logFileLocation": "s3://mybucket/logs/import-logs-7A84B875-6F75-4C7B-B137-0632B62DB0BD.txt",
    "startTime": "2024-03-01T10:00:00.000Z",
    "completionTime": "2024-03-01T11:15:00.000Z",
    "results": {
        "successful": 4985,
        "warnings": 10,
        "errors": 5
    }
}

If the results object shows any errors, you must correct them in your source content and re-run the validation process before attempting a final import.