All checks were successful
/ test (push) Successful in 9m10s
Extend CleverSwarm codebase with PDF and Machine and Worker IDs ISSUES CLOSED: #1 |
||
---|---|---|
.forgejo/workflows | ||
src/cleverswarm_python_client | ||
tests | ||
.cz-config.js | ||
.cz.json | ||
.editorconfig | ||
.gitignore | ||
ATTRIBUTIONS.md | ||
CHANGELOG.md | ||
CODE_OF_CONDUCT.md | ||
conftest.py | ||
CONTRIBUTING.md | ||
CONTRIBUTORS.md | ||
MANIFEST.in | ||
NOTICE | ||
pytest.ini | ||
README.md | ||
requirements-dev.txt | ||
requirements.txt | ||
setup.cfg | ||
setup.py | ||
tox.ini |
CleverSwarm Python client and SDK
This repository provides CLI Python client applications for interacting with CleverSwarm micro-services.
It also provides a SDK for easy integration of CleverSwarm with external applications (APPs) or frameworks.
Text to Knowledge-Graph extraction CLI APP
This tool enables Knowledge-Graph (KG) extraction from unstructured text, provided an ontology.
The ontology is actually split across two files OWL/RDF ontology file and a JSON ontology enriched with descriptions. Wilcards triplets filtering is also supported, enabling KG extraction of the relevant triplets only.
The extracted KG is exported to a file in the form of triplets in JSON-LD format.
Running the text to KG extraction CLI APP
cswarm_text_to_kg_client.py
is the end-user application for doing text extraction from the CLI.
-
Listing text to KG extraction jobs
cswarm_text_to_kg_client.py --action list-kg-jobs-and-exit --server_url http://127.0.0.1:8000/api/v0
will list all KG extraction jobs, but will exclude benchmark jobs from the listing.
- Optionally the user may want to add `--username <user>`, where `<user>` denotes the actual username of the user on which behalf the extraction job will run. This will avoid the need to be prompted for the username. - The user may also specify `--token <token>`, where `<token>` is a valid token for the user. That way no manual login will be required. - The user may also indicate the option `--detailed` to get more detailed information.
-
Doing an unstructured text to KG extraction
cswarm_text_to_kg_client.py --input_prefix unstructured_tests --unstructured_text "text 1.txt" --ontology_json ontology.json --ontology_owl ontology.owl --server_url http://localhost/api/v0
will create a regular text to KG extraction job based on the input files inside the
unstructured_tests
folder as indicated by--input-prefix
. Secondly the application will wait for KG extraction job completion, and then it will download the results file. The output folder will be the current folder, since option--output_prefix
is not provided. Finally the extraction job will be deleted from the REST API server.- The additional optional arguments described for the jobs listing action are also valid for this default action.
-
Doing an unstructured text to KG extraction with wildcards filtering
cswarm_text_to_kg_client.py --input_prefix wildxard_tests --unstructured_text text.txt --ontology_json ontology.json --ontology_owl ontology.owl --wildcards ~/Documents/wildcard_query_ist.json --server_url http://localhost/api/v0
the additional
--wildcards
option is what differentiates from the regular text to KG extraction to the wildcards text to KG extraction. Please note that--wildcards
does not inherit the input folder prefix, so a full path to the wildcards file should be specified.- The additional optional arguments described for the jobs listing action are also valid for this default action.
-
For additional command line arguments, please do:
cswarm_text_to_kg_client.py --help
Benchmarking CleverSwarm
Make sure to install CleverSwarm Python client: pip install --index-url https://git.cleverthis.com/api/packages/cleverlibre/pypi/simple/ cleverswarm-python-client
and to clone the CleverSwarm Benchmark repo: git clone ssh://git@git.cleverthis.com/cleverswarm/cleverswarm-benchmark.git
Running the benchmark
-
Running the full set of original tests with no previous tests in the local filesystem cache:
cswarm_benchmark_client --ontologies_ids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 17 18 19
NOTE 1: This assumes the default benchmark datasets prefix of './data/dbpedia'. If another prefix is required please provide the option
--bench_data_prefix <new_path_prefix_to_benchmark_datasets>
NOTE 2: Re-running the command will take the same time to execute and will overwrite all previously locally cached results for the specified tests cases.
-
Running the full set of original tests with some previous tests in the local filesystem cache to obtain the average of all the tests:
For this case, lets assume that the first request benchmarked tests were 1, 2, 3, 4, 5 and 6, so that a previous benchmark job ran like so:
cswarm_benchmark_client --ontologies_ids 1 2 3 4 5 6
So, all is needed is to specify that the new server benchmark job can exclude the jobs from 1 to 6 and just start at 7, for that we can utilize the option:
--ontologies_ids 7 8 9 10 11 12 13 14 15 15 17 18 19
however, since we want want the average scores for the jobs 1-19, we also need to append this option:
--local_metrics_ontologies_ids 1 2 3 4 5 6
the final command line, will then look like this:
cswarm_benchmark_client --ontologies_ids 7 8 9 10 11 12 13 14 15 15 17 18 19 --local_metrics_ontologies_ids 1 2 3 4 5 6
-
Running the evalution metrics for cached test cases, supposing that all the test cases were already processed on the server and their results downloaded to the local filesystem, then we can do:
cswarm_benchmark_client.py --action metrics-and-exit --local_metrics_ontologies_ids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 17 18 19
Other details about the Benchmark client tool
-
For general help do:
cswarm_benchmark_client.py -h
-
To specify a CleverThis CleverSwarm server URL do:
cswarm_benchmark_client.py --server_url https://cleverthis.com
-
To specify a username for authentication with the server do:
cswarm_benchmark_client.py --username <username>
-
To specify a given action for the benchmark client, do:
cswarm_benchmark_client.py --action <action_to_be_performed>
Detail of the available Benchmark client tool actions
-
--action create-and-exit
creates a benchmark job in the server for the specified test cases and exits. -
--action delete-and-exit
deletes a benchmark job in the server and exits. -
--action metrics-and-exit
computes the summarized evaluation metrics/scores for the specified local filesystem cached test cases and exits. -
--action metrics-and-exit --detailed
computes the detailed evaluation metrics/scores for the specified local filesystem cached test cases and exits. -
--action download-metrics-and-exit
downloads the test cases results from the server to the local filesystem cache. -
--action list-benchmark-jobs-and-exit
to obatin a summarized list of the benchmark jobs in the server, followed by exit. -
--action list-benchmark-jobs-and-exit --detailed
to obatin a detailed list of the benchmark jobs in the server, followed by exit. -
--action list-ontologies-ids-and-exit
lists the known ontologies test cases ids and names. -
--action create-download-metrics-delete
performs the full client benchmark process, composed of server jobs creation, polling of the job completion, downloading the jobs results to the local filesystem benchmark results folder, copmuting evaluation metrics locally in the local machine and deleting the benchmark job in the server. NOTE This is the default, if no action is specified in the command-line. -
--action create-download-metrics
similar to the above, but does not delete the benchmark job in the server at the end of the processing.
Reusing the client code library
- Please look into
client/cleverswarm_client.py
and associated modules.