Maxime Wack 323b6be7d2 | 3 kuukautta sitten | |
---|---|---|
docs/figures | 3 kuukautta sitten | |
tests | 3 kuukautta sitten | |
.gitignore | 3 kuukautta sitten | |
COPYING | 3 kuukautta sitten | |
README.org | 3 kuukautta sitten | |
example.conf | 3 kuukautta sitten | |
functions | 3 kuukautta sitten | |
git-ommix | 3 kuukautta sitten | |
gitommix-completions | 3 kuukautta sitten | |
makefile | 3 kuukautta sitten |
Git ommix helps managing high-dimensional data (eg: omics, imagery, pathology) in a longitudinal manner, coupled to a representation of the provenance using the PROV ontology.
Git ommix creates patient-level repositories to store sample references, versionned data obtained from the samples, and the versionned result of the data analysis and ensuing diagnoses.
Large files are only retrieved on demand thanks to git annex, decorrelating navigating the history from actually downloading all of it.
Git ommix also stores a representation of the provenance of each of those entities using the PROV ontology. Git ommix allows querying the repository structure, implementing multiple useful operations. These operations can apply to the whole patient's history or be constrained to one or multiple specific objects (sample/data/result/diagnosis)
list the objects contributing to the target (the data contributing to a result or to a diagnosis, samples contributing to diagnosis)
get the most recent version of the target
get the PROV-O provenance of the target, as turtle triplets or as a visual graph
display a timeline of diagnoses
execute any sparQL query on a repo
GitOmmix is implemented as a bash script. It relies mostly on git, but also uses:
git annex (https://git-annex.branchable.com/) to handle large files (10.20230926)
rapper (https://librdf.org/raptor/rapper.html) to manage RDF stores (2.0.15)
roqet (https://librdf.org/rasqal/roqet.html) to query RDF stores (0.9.33)
graphviz (https://graphviz.org/) to generate visual representations (2.42.2)
bash-completion (https://github.com/scop/bash-completion/) to benefit from autocompletions in bash (2.11)
Git ommix has been tested on ubuntu 22.04.3 LTS (Jammy). Install raptor2-utils and rasqal-utils to get rapper and roqet. Bash-completion should already be installed, and graphviz can also be found on the official repos. However, the version of git-annex provided by ubuntu is too old (8) and version 10 should be installed. The latest version can be obtained from this repo : http://neuro.debian.net/pkgs/git-annex-standalone.html
OSX users can find all the required dependencies on homebrew.
Run sudo make install
to install git ommix on your computer.
From the root directory of this repository, run tests/<name>.test
to run test name.
The git ommix commands all follow the same pattern : git ommix {verb} {object} [–options] [–message] [rest] git ommix does not have to and should not be called from the git ommix store git ommix can be run from any directory containing files to add to a patient's history
Group of operations used to create the patients stores.
All operations accept these options:
–id the new object's id if needs to be provided, or a randomly generated id –method an optionnal PROV Activity used to generate the new object –provider an optionnal PROV Agent involved in generating the new object –date the date of creation ef the object, defaults to the current date
git ommix add patient
git ommix add sample -p|–patient <patient>
Add a sample to <patient>
git ommix add data -p|–patient <patient> -s|–sample <sample> [–revision_of <data>] [–invalidate <data>] [FILES]
Add [FILES] to a data object in <sample> of <patient> FILES defaults to all the files in the current directory All data in a sample derive from (use) the <sample> New data files can be a revision of previous <data> in the same <sample>, and can also invalidate previous <data> in the same <sample> –invalidate can be specified multiple times to invalidate multiple <data> in the same <sample> with the new data
git ommix add result -p|–patient <patient> -s|–sample <sample> –use <data> [–revision_of <result>] [–invalidate <result>] [FILES]
Add [FILES] to a result object in <sample> of <patient> FILES defaults to all the files in the current directory A result derives from (use) <data> in the same <sample> –use can be specified multiple times to derive the new result from multiple <data> in the same <sample> New result files can be a revision of previous <result> in the same <sample>, and can also invalidate previous <result> in the same <sample> –invalidate can be specified multiple times to invalidate multiple <result> in the same <sample> with the new result
git ommix add diagnosis -p|–patient <patient> –use <result|diagnosis> [–revision_of <diagnosis>] [–invalidate <diagnosis>]
Diagnoses live outside of samples and can be used to tie multiple results from different samples into a clinically coherent history A diagnosis derives from (use) a <result> or a previous <diagnosis> –use can be specified multiple times to derive the new diagnosis from multiple <result> or <diagnosis> A new diagnosis can be a revision of a previous <diagnosis> and can also invalidate previous <diagnosis> –invalidate can be specified multiple times to invalidate multiple <diagnosis> with the new diagnosis
git ommix list patient
List all the patients known in the local store
git ommix list sample|data|result|diagnosis -p|–patient <patient> [ref]
List all the sample|data|result|diagnosis objects in <patient> [ref] limits the list to the history of [ref] [ref] can be expressed as a commit hash or an object name (type:id or id) Multiple [ref] can be provided IDs matching multiple objects expand to multiple [ref]
(nearly) All the get commands accept or even require a [ref] As previously, [ref] constrains the result to the context of [ref] [ref] can be expressed as a commit hash or an object name (type:id or id) Multiple [ref] can be provided IDs matching multiple objects expand to multiple [ref]
git ommix get prov -p|–patient <patient> [ref]
Output the RDF graph as turtle triplets
git ommix get graph -p|–patient <patient> [ref]
Output a graphical representation of the RDF graph
git ommix get timeline -p|–patient <patient> [ref]
Output a graphical representation of clinical history of the patient, omitting samples, data, and results
git ommix get last -p|–patient <patient> <ref>
Get the up to date version of the pointed ref, as well as the most recent diagnosis it participates to
git ommix get object -p|–patient <patient> <ref>
Checkout the patients' repo at the given object
git ommix get file -p|–patient <patient> [ref]
List the files added by the given object
git ommix get log -p|–patient <patient> [ref]
Print the git log of the patients' repo
git ommix get sparql -p|–patient <patient> "SPARQL query"
Output the result of the sparql query as turtle triplets