Ingest data
Ingest output files into OpenSearch using POST /corpus/chunk using a registered user with elevated privileges.
Workflow: Scan harvest output folder and get the list of *.json files ordered by modified date Foreach file: Foreach line in one output file: POST /corpus/chunk body = line as decoded json if timeout or 404 error (= OpenSearch error) => try additional 5 times, wait between calls 10 sec, 1 min, 10min, 15 min, 35 min if all calls fail (or critical error 403, 500) stop and save progress
Each time, the process starts from the last saved progress.