No description
Find a file
CoreRasurae cd808dc3dd
All checks were successful
/ test (push) Successful in 9m10s
feat: Initial version of CleverSwarm Python SDK
Extend CleverSwarm codebase with PDF and Machine and Worker IDs

ISSUES CLOSED: #1
2025-09-29 10:37:49 +01:00
.forgejo/workflows feat: Initial version of CleverSwarm Python SDK 2025-09-29 10:37:49 +01:00
src/cleverswarm_python_client feat: Initial version of CleverSwarm Python SDK 2025-09-29 10:37:49 +01:00
tests feat: Initial version of CleverSwarm Python SDK 2025-09-29 10:37:49 +01:00
.cz-config.js feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
.cz.json feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
.editorconfig feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
.gitignore feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
ATTRIBUTIONS.md feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
CHANGELOG.md feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
CODE_OF_CONDUCT.md feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
conftest.py feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
CONTRIBUTING.md feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
CONTRIBUTORS.md feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
MANIFEST.in feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
NOTICE feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
pytest.ini feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
README.md feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
requirements-dev.txt feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
requirements.txt feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
setup.cfg feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
setup.py feat: Initial commit of the SDK at version v0.0.1 2025-09-27 20:55:31 +01:00
tox.ini feat: Initial version of CleverSwarm Python SDK 2025-09-29 10:37:49 +01:00

CleverSwarm Python client and SDK

This repository provides CLI Python client applications for interacting with CleverSwarm micro-services.

It also provides a SDK for easy integration of CleverSwarm with external applications (APPs) or frameworks.

Text to Knowledge-Graph extraction CLI APP

This tool enables Knowledge-Graph (KG) extraction from unstructured text, provided an ontology.

The ontology is actually split across two files OWL/RDF ontology file and a JSON ontology enriched with descriptions. Wilcards triplets filtering is also supported, enabling KG extraction of the relevant triplets only.

The extracted KG is exported to a file in the form of triplets in JSON-LD format.

Running the text to KG extraction CLI APP

cswarm_text_to_kg_client.py is the end-user application for doing text extraction from the CLI.

  • Listing text to KG extraction jobs

    cswarm_text_to_kg_client.py --action list-kg-jobs-and-exit --server_url http://127.0.0.1:8000/api/v0

    will list all KG extraction jobs, but will exclude benchmark jobs from the listing.

      - Optionally the user may want to add `--username <user>`, where `<user>` denotes the actual username of the user on which behalf the extraction job will run. This will avoid the need to be prompted for the username.
    
      - The user may also specify `--token <token>`, where `<token>` is a valid token for the user. That way no manual login will be required.
    
      - The user may also indicate the option `--detailed` to get more detailed information.
    
  • Doing an unstructured text to KG extraction

    cswarm_text_to_kg_client.py --input_prefix unstructured_tests --unstructured_text "text 1.txt" --ontology_json ontology.json --ontology_owl ontology.owl --server_url http://localhost/api/v0

    will create a regular text to KG extraction job based on the input files inside the unstructured_tests folder as indicated by --input-prefix. Secondly the application will wait for KG extraction job completion, and then it will download the results file. The output folder will be the current folder, since option --output_prefix is not provided. Finally the extraction job will be deleted from the REST API server.

      - The additional optional arguments described for the jobs listing action are also valid for this default action.
    
  • Doing an unstructured text to KG extraction with wildcards filtering

    cswarm_text_to_kg_client.py --input_prefix wildxard_tests --unstructured_text text.txt --ontology_json ontology.json --ontology_owl ontology.owl --wildcards ~/Documents/wildcard_query_ist.json --server_url http://localhost/api/v0

    the additional --wildcards option is what differentiates from the regular text to KG extraction to the wildcards text to KG extraction. Please note that --wildcards does not inherit the input folder prefix, so a full path to the wildcards file should be specified.

      - The additional optional arguments described for the jobs listing action are also valid for this default action.
    
  • For additional command line arguments, please do:

    cswarm_text_to_kg_client.py --help

Benchmarking CleverSwarm

Make sure to install CleverSwarm Python client: pip install --index-url https://git.cleverthis.com/api/packages/cleverlibre/pypi/simple/ cleverswarm-python-client and to clone the CleverSwarm Benchmark repo: git clone ssh://git@git.cleverthis.com/cleverswarm/cleverswarm-benchmark.git

Running the benchmark

  • Running the full set of original tests with no previous tests in the local filesystem cache:

    cswarm_benchmark_client --ontologies_ids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 17 18 19

    NOTE 1: This assumes the default benchmark datasets prefix of './data/dbpedia'. If another prefix is required please provide the option --bench_data_prefix <new_path_prefix_to_benchmark_datasets>

    NOTE 2: Re-running the command will take the same time to execute and will overwrite all previously locally cached results for the specified tests cases.

  • Running the full set of original tests with some previous tests in the local filesystem cache to obtain the average of all the tests:

    For this case, lets assume that the first request benchmarked tests were 1, 2, 3, 4, 5 and 6, so that a previous benchmark job ran like so:

    cswarm_benchmark_client --ontologies_ids 1 2 3 4 5 6

    So, all is needed is to specify that the new server benchmark job can exclude the jobs from 1 to 6 and just start at 7, for that we can utilize the option:

    --ontologies_ids 7 8 9 10 11 12 13 14 15 15 17 18 19

    however, since we want want the average scores for the jobs 1-19, we also need to append this option:

    --local_metrics_ontologies_ids 1 2 3 4 5 6

    the final command line, will then look like this:

    cswarm_benchmark_client --ontologies_ids 7 8 9 10 11 12 13 14 15 15 17 18 19 --local_metrics_ontologies_ids 1 2 3 4 5 6

  • Running the evalution metrics for cached test cases, supposing that all the test cases were already processed on the server and their results downloaded to the local filesystem, then we can do:

    cswarm_benchmark_client.py --action metrics-and-exit --local_metrics_ontologies_ids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 17 18 19

Other details about the Benchmark client tool

  • For general help do:

    cswarm_benchmark_client.py -h

  • To specify a CleverThis CleverSwarm server URL do:

    cswarm_benchmark_client.py --server_url https://cleverthis.com

  • To specify a username for authentication with the server do:

    cswarm_benchmark_client.py --username <username>

  • To specify a given action for the benchmark client, do:

    cswarm_benchmark_client.py --action <action_to_be_performed>

Detail of the available Benchmark client tool actions

  • --action create-and-exit creates a benchmark job in the server for the specified test cases and exits.

  • --action delete-and-exit deletes a benchmark job in the server and exits.

  • --action metrics-and-exit computes the summarized evaluation metrics/scores for the specified local filesystem cached test cases and exits.

  • --action metrics-and-exit --detailed computes the detailed evaluation metrics/scores for the specified local filesystem cached test cases and exits.

  • --action download-metrics-and-exit downloads the test cases results from the server to the local filesystem cache.

  • --action list-benchmark-jobs-and-exit to obatin a summarized list of the benchmark jobs in the server, followed by exit.

  • --action list-benchmark-jobs-and-exit --detailed to obatin a detailed list of the benchmark jobs in the server, followed by exit.

  • --action list-ontologies-ids-and-exit lists the known ontologies test cases ids and names.

  • --action create-download-metrics-delete performs the full client benchmark process, composed of server jobs creation, polling of the job completion, downloading the jobs results to the local filesystem benchmark results folder, copmuting evaluation metrics locally in the local machine and deleting the benchmark job in the server. NOTE This is the default, if no action is specified in the command-line.

  • --action create-download-metrics similar to the above, but does not delete the benchmark job in the server at the end of the processing.

Reusing the client code library

  • Please look into client/cleverswarm_client.py and associated modules.