No remote found
Workflow
The following workflow defines all tasks necessary to upload, preprocess, share, and map a new data dictionary.
- Upload cohort data
- Open Google Sheet
- Run automated mapping for new data dictionary
- Share Google Sheet with submitter
- Prepare data dictionary for build
- Run automated validation
- Build data dictionary
- View results
- Add data dictionary to Version Control
- Prepare git commit (click on Commit in Version menu)
- Push changes to GitHub (click on Push in Version menu), and make pull request.
- Delete Google sheet (Caution, cannot be undone)
IHCC Data Admin Tasks
Console
Action automated_mapping started at 2022-11-21T12:35:49.181Z (2022-11-21T12:35:49.181Z)
Success
$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EBC
mkdir -p build/intermediate
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
term match confidence
0 Person GECKO:0000066 0.76
1 Person GECKO:0000055 0.76
2 Person GECKO:0000120 0.76
3 Sample GECKO:0000052 0.98
4 Nationality GECKO:0000064 1.00
5 Education GECKO:0000065 1.00
6 Smoking GECKO:0000068 1.00
7 Alcohol GECKO:0000069 1.00
8 Sleep GECKO:0000071 0.98
9 Health GECKO:0000126 0.98
curl -L -o build/gecko.owl http://purl.obolibrary.org/obo/gecko/views/ihcc-gecko.owl
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 349 100 349 0 0 1876 0 --:--:-- --:--:-- --:--:-- 1876
100 105k 100 105k 0 0 235k 0 --:--:-- --:--:-- --:--:-- 235k
curl -Lk -o build/robot.jar https://build.obolibrary.io/job/ontodev/job/robot/job/master/lastSuccessfulBuild/artifact/bin/robot.jar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 77.1M 0 15455 0 0 25048 0 0:53:50 --:--:-- 0:53:50 25008
3 77.1M 3 2577k 0 0 1584k 0 0:00:49 0:00:01 0:00:48 1583k
29 77.1M 29 22.9M 0 0 8886k 0 0:00:08 0:00:02 0:00:06 8883k
54 77.1M 54 41.9M 0 0 11.6M 0 0:00:06 0:00:03 0:00:03 11.6M
80 77.1M 80 62.1M 0 0 13.4M 0 0:00:05 0:00:04 0:00:01 13.4M
100 77.1M 100 77.1M 0 0 14.3M 0 0:00:05 0:00:05 --:--:-- 16.2M
java -jar build/robot.jar --prefixes src/prefixes.json query --input build/gecko.owl --query src/queries/ihcc-mapping-gecko.sparql build/intermediate/gecko-xrefs-sparql.csv
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
Term ID Label
0 EBC:0000001 Person
1 EBC:0000002 PersonPortrait
2 EBC:0000003 DiagnosisConsolidated
3 EBC:0000004 Sample
4 EBC:0000005 InformedConsent
NLP matching successful. First twenty results:
term match confidence
0 Sample GECKO:0000052 0.199399
1 Health GECKO:0000052 0.105097
2 Education GECKO:0000065 0.167599
3 Person GECKO:0000066 0.249490
4 Smoking GECKO:0000068 0.869039
5 Alcohol GECKO:0000069 0.900000
6 Sleep GECKO:0000071 0.253313
7 Work GECKO:0000131 0.115807
8 PersonPortrait MONDO:0005084 0.108782
9 DiagnosisConsolidated MONDO:0005084 0.108782
10 InformedConsent MONDO:0005084 0.108782
11 Answerset MONDO:0005084 0.108782
12 Nationality MONDO:0005084 0.108782
13 PhysicalActivities MONDO:0005084 0.108782
14 EatingHabits MONDO:0005084 0.108782
15 OtherDrugs MONDO:0005084 0.108782
16 Sleep MONDO:0005084 0.112482
17 FemaleHealth MONDO:0005084 0.108782
18 MotherHealth MONDO:0005084 0.108782
19 ObjectiveInformation MONDO:0005084 0.108782
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
term match confidence
0 Person GECKO:0000066 0.76
1 Person GECKO:0000055 0.76
2 Person GECKO:0000120 0.76
3 Person GECKO:0000066 0.76
4 Person GECKO:0000055 0.76
5 Person GECKO:0000120 0.76
6 Sample GECKO:0000052 0.98
7 Sample GECKO:0000052 0.98
8 Nationality GECKO:0000064 1.00
9 Nationality GECKO:0000064 1.00
10 Education GECKO:0000065 1.00
11 Education GECKO:0000065 1.00
12 PhysicalActivities GECKO:0000104 1.00
13 EatingHabits GECKO:0000072 0.98
14 Smoking GECKO:0000068 1.00
15 Smoking GECKO:0000068 1.00
16 Alcohol GECKO:0000069 1.00
17 Alcohol GECKO:0000069 1.00
18 Sleep GECKO:0000071 0.98
19 Sleep GECKO:0000071 0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
Term ID Label
0 EBC:0000001 Person
1 EBC:0000002 PersonPortrait
2 EBC:0000003 DiagnosisConsolidated
3 EBC:0000004 Sample
4 EBC:0000005 InformedConsent
NLP matching successful. First twenty results:
term match confidence
0 Sample GECKO:0000052 0.198846
1 Health GECKO:0000052 0.104726
2 Education GECKO:0000065 0.168262
3 Person GECKO:0000066 0.247261
4 Smoking GECKO:0000068 0.867914
5 Alcohol GECKO:0000069 0.900000
6 Sleep GECKO:0000071 0.252240
7 Work GECKO:0000131 0.115833
8 PersonPortrait MONDO:0005084 0.109894
9 DiagnosisConsolidated MONDO:0005084 0.109894
10 InformedConsent MONDO:0005084 0.109894
11 Answerset MONDO:0005084 0.109894
12 Nationality MONDO:0005084 0.109894
13 PhysicalActivities MONDO:0005084 0.109894
14 EatingHabits MONDO:0005084 0.109894
15 OtherDrugs MONDO:0005084 0.109894
16 Sleep MONDO:0005084 0.113986
17 FemaleHealth MONDO:0005084 0.109894
18 MotherHealth MONDO:0005084 0.109894
19 ObjectiveInformation MONDO:0005084 0.109894
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
term match confidence
0 Person GECKO:0000066 0.76
1 Person GECKO:0000055 0.76
2 Person GECKO:0000120 0.76
3 Sample GECKO:0000052 0.98
4 Nationality GECKO:0000064 1.00
5 Education GECKO:0000065 1.00
6 Smoking GECKO:0000068 1.00
7 Alcohol GECKO:0000069 1.00
8 Sleep GECKO:0000071 0.98
9 Health GECKO:0000126 0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
Term ID Label
0 EBC:0000001 Person
1 EBC:0000002 PersonPortrait
2 EBC:0000003 DiagnosisConsolidated
3 EBC:0000004 Sample
4 EBC:0000005 InformedConsent
NLP matching successful. First twenty results:
term match confidence
0 Sample GECKO:0000052 0.199879
1 Health GECKO:0000052 0.104178
2 Education GECKO:0000065 0.168348
3 Person GECKO:0000066 0.249268
4 Smoking GECKO:0000068 0.868788
5 Alcohol GECKO:0000069 0.900000
6 Sleep GECKO:0000071 0.252244
7 Work GECKO:0000131 0.114595
8 PersonPortrait MONDO:0005084 0.109368
9 DiagnosisConsolidated MONDO:0005084 0.109368
10 InformedConsent MONDO:0005084 0.109368
11 Answerset MONDO:0005084 0.109368
12 Nationality MONDO:0005084 0.109368
13 PhysicalActivities MONDO:0005084 0.109368
14 EatingHabits MONDO:0005084 0.109368
15 OtherDrugs MONDO:0005084 0.109368
16 Sleep MONDO:0005084 0.113353
17 FemaleHealth MONDO:0005084 0.109368
18 MotherHealth MONDO:0005084 0.109368
19 ObjectiveInformation MONDO:0005084 0.109368
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv -s build/intermediate/cogs_mapping_suggestions_zooma.tsv -s build/intermediate/cogs_mapping_suggestions_nlp.tsv -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
term ... confidence
0 Person ... 0.760000
1 Person ... 0.760000
2 Person ... 0.760000
3 Sample ... 0.980000
4 Nationality ... 1.000000
5 Education ... 1.000000
6 Smoking ... 1.000000
7 Alcohol ... 1.000000
8 Sleep ... 0.980000
9 Health ... 0.980000
0 Sample ... 0.199399
1 Health ... 0.105097
2 Education ... 0.167599
3 Person ... 0.249490
4 Smoking ... 0.869039
5 Alcohol ... 0.900000
6 Sleep ... 0.253313
7 Work ... 0.115807
8 PersonPortrait ... 0.108782
9 DiagnosisConsolidated ... 0.108782
[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
Term ID ... Comment
0 EBC:0000001 ... NaN
1 EBC:0000002 ... NaN
2 EBC:0000003 ... NaN
3 EBC:0000004 ... NaN
4 EBC:0000005 ... NaN
5 EBC:0000006 ... NaN
6 EBC:0000007 ... NaN
7 EBC:0000008 ... NaN
8 EBC:0000009 ... NaN
9 EBC:0000010 ... NaN
10 EBC:0000011 ... NaN
11 EBC:0000012 ... NaN
12 EBC:0000013 ... NaN
13 EBC:0000014 ... NaN
14 EBC:0000015 ... NaN
15 EBC:0000016 ... NaN
16 EBC:0000017 ... NaN
17 EBC:0000018 ... NaN
18 EBC:0000019 ... NaN
19 EBC:0000020 ... NaN
[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
java -jar build/robot.jar --prefixes src/prefixes.json export \
--input build/gecko.owl \
--header "LABEL" \
--export build/gecko_labels.tsv
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push
Success