No remote found
Workflow
The following workflow defines all tasks necessary to upload, preprocess, share, and map a new data dictionary.
- Upload cohort data
- Open Google Sheet
- Run automated mapping for new data dictionary
- Share Google Sheet with submitter
- Prepare data dictionary for build
- Run automated validation
- Build data dictionary
- View results
- Add data dictionary to Version Control
- Prepare git commit (click on Commit in Version menu)
- Push changes to GitHub (click on Push in Version menu), and make pull request.
- Delete Google sheet (Caution, cannot be undone)
IHCC Data Admin Tasks
Console
Action automated_mapping started at 2022-11-18T16:02:29.724Z (2022-11-18T16:02:29.724Z)
Success
$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EBB
mkdir -p build/intermediate
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
term match confidence
0 gender GECKO:0000060 1.00
1 height CMO:0000106 1.00
2 age PATO:0000011 1.00
3 death date STATO:0000093 0.76
4 death date STATO:0000093 0.76
5 death date UBERON:0000105 0.76
6 Nationality GECKO:0000064 1.00
7 Nationality nationality GECKO:0000064 0.98
8 Smoking GECKO:0000068 1.00
9 Alcohol GECKO:0000069 1.00
10 Tumors MONDO:0004992 0.76
11 Tumors MONDO:0005039 0.76
12 Diabetes MONDO:0005151 1.00
curl -L -o build/gecko.owl http://purl.obolibrary.org/obo/gecko/views/ihcc-gecko.owl
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 349 100 349 0 0 1745 0 --:--:-- --:--:-- --:--:-- 1745
100 105k 100 105k 0 0 232k 0 --:--:-- --:--:-- --:--:-- 232k
curl -Lk -o build/robot.jar https://build.obolibrary.io/job/ontodev/job/robot/job/master/lastSuccessfulBuild/artifact/bin/robot.jar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
2 77.1M 2 1590k 0 0 1067k 0 0:01:14 0:00:01 0:01:13 1066k
25 77.1M 25 19.8M 0 0 8374k 0 0:00:09 0:00:02 0:00:07 8374k
49 77.1M 49 38.5M 0 0 11.3M 0 0:00:06 0:00:03 0:00:03 11.3M
76 77.1M 76 59.1M 0 0 13.4M 0 0:00:05 0:00:04 0:00:01 13.4M
100 77.1M 100 77.1M 0 0 14.5M 0 0:00:05 0:00:05 --:--:-- 15.9M
java -jar build/robot.jar --prefixes src/prefixes.json query --input build/gecko.owl --query src/queries/ihcc-mapping-gecko.sparql build/intermediate/gecko-xrefs-sparql.csv
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
Term ID Label
0 EBB:0000001 Person.skood
1 EBB:0000002 Person.gender
2 EBB:0000003 Person.birthDate
3 EBB:0000004 Person.birthYear
4 EBB:0000005 Person.agreementDate
NLP matching successful. First twenty results:
term match confidence
0 ObjectiveInformation.weight CMO:0000012 0.837672
1 ObjectiveInformation.bmi CMO:0000021 0.670602
2 ObjectiveInformation.waist CMO:0000021 0.266035
3 ObjectiveInformation.hip CMO:0000021 0.273149
4 height CMO:0000106 0.683658
5 ObjectiveInformation.height CMO:0000106 0.683658
6 PhysicalExercise.code GECKO:0000052 0.399442
7 ProfessionalSportPast.code GECKO:0000052 0.399442
8 ProfessionalSport.code GECKO:0000052 0.399442
9 HormonalContraceptiveUsed.code GECKO:0000052 0.399442
10 HormonalContraceptiveUsedV1.code GECKO:0000052 0.399442
11 HormonalMedicationMenopause.code GECKO:0000052 0.399442
12 HormonalMedicationMenopauseV1.code GECKO:0000052 0.399442
13 Health.movement GECKO:0000052 0.104909
14 Health.selfcare GECKO:0000052 0.104909
15 Health.commonActivities GECKO:0000052 0.104909
16 Health.painDiscomfort GECKO:0000052 0.104909
17 Health.anxietyDepression GECKO:0000052 0.104909
18 MedicationsForTroubledBreathing.code GECKO:0000052 0.399442
19 DiseasesDiagnosed.code GECKO:0000052 0.399442
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
term match confidence
0 Person.gender GECKO:0000066 0.98
1 gender GECKO:0000060 1.00
2 gender GECKO:0000060 1.00
3 height CMO:0000106 1.00
4 height CMO:0000106 1.00
5 age PATO:0000011 1.00
6 age PATO:0000011 1.00
7 death date STATO:0000093 0.76
8 death date STATO:0000093 0.76
9 death date UBERON:0000105 0.76
10 death date STATO:0000093 0.76
11 death date STATO:0000093 0.76
12 death date UBERON:0000105 0.76
13 Nationality GECKO:0000064 1.00
14 Nationality GECKO:0000064 1.00
15 Nationality.nationality GECKO:0000064 0.98
16 Nationality nationality GECKO:0000064 0.98
17 Nationality nationality GECKO:0000064 0.98
18 Smoking GECKO:0000068 1.00
19 Smoking GECKO:0000068 1.00
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
Term ID Label
0 EBB:0000001 Person.skood
1 EBB:0000002 Person.gender
2 EBB:0000003 Person.birthDate
3 EBB:0000004 Person.birthYear
4 EBB:0000005 Person.agreementDate
NLP matching successful. First twenty results:
term match confidence
0 ObjectiveInformation.weight CMO:0000012 0.836668
1 ObjectiveInformation.bmi CMO:0000021 0.669178
2 ObjectiveInformation.waist CMO:0000021 0.268965
3 ObjectiveInformation.hip CMO:0000021 0.274443
4 height CMO:0000106 0.685696
5 ObjectiveInformation.height CMO:0000106 0.685696
6 PhysicalExercise.code GECKO:0000052 0.401939
7 ProfessionalSportPast.code GECKO:0000052 0.401939
8 ProfessionalSport.code GECKO:0000052 0.401939
9 HormonalContraceptiveUsed.code GECKO:0000052 0.401939
10 HormonalContraceptiveUsedV1.code GECKO:0000052 0.401939
11 HormonalMedicationMenopause.code GECKO:0000052 0.401939
12 HormonalMedicationMenopauseV1.code GECKO:0000052 0.401939
13 Health.movement GECKO:0000052 0.104880
14 Health.selfcare GECKO:0000052 0.104880
15 Health.commonActivities GECKO:0000052 0.104880
16 Health.painDiscomfort GECKO:0000052 0.104880
17 Health.anxietyDepression GECKO:0000052 0.104880
18 MedicationsForTroubledBreathing.code GECKO:0000052 0.401939
19 DiseasesDiagnosed.code GECKO:0000052 0.401939
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
term match confidence
0 Person.gender GECKO:0000060 1.00
1 Person.birthDate PATO:0000011 0.98
2 Person.birthYear GECKO:0000066 0.98
3 Person.deathDate STATO:0000093 1.00
4 Nationality.nationality GECKO:0000064 1.00
5 Smoking has Smoked GECKO:0000064 1.00
6 Alcohol_Consumption GECKO:0000064 1.00
7 Tumors MONDO:0004992 0.76
8 Tumors MONDO:0005039 0.76
9 Diabetes MONDO:0005151 1.00
10 Nationality.nationality GECKO:0000064 1.00
11 SpareTimeActivities.shoppingPerWeek GECKO:0000131 0.76
12 SpareTimeActivities.shoppingPerWeek GECKO:0000104 0.76
13 SpareTimeActivities.cleaningPerWeek GECKO:0000052 0.76
14 SpareTimeActivities.cleaningPerWeek MONDO:0004992 0.76
15 SpareTimeActivities.cleaningPerWeek GECKO:0000060 0.76
16 SpareTimeActivities.physicalExercisePerWeek OGMS:0000020 0.98
17 SpareTimeActivities.readingPerWeek CMO:0000294 0.76
18 SpareTimeActivities.readingPerWeek CMO:0000003 0.76
19 TobaccoLast12Months.smokeProd GECKO:0000068 0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
Term ID Label
0 EBB:0000001 Person.skood
1 EBB:0000002 Person.gender
2 EBB:0000003 Person.birthDate
3 EBB:0000004 Person.birthYear
4 EBB:0000005 Person.agreementDate
NLP matching successful. First twenty results:
term match confidence
0 PersonPortrait.lastWeight CMO:0000012 0.240447
1 ObjectiveInformation.weight CMO:0000012 0.777004
2 PersonPortrait.lastBmi CMO:0000021 0.158019
3 height CMO:0000021 0.110494
4 ObjectiveInformation.bmi CMO:0000021 0.622844
5 ObjectiveInformation.waist CMO:0000021 0.246794
6 ObjectiveInformation.hip CMO:0000021 0.251361
7 PersonPortrait.lastHeight CMO:0000106 0.136501
8 ObjectiveInformation.height CMO:0000106 0.629629
9 height GECKO:0000052 0.102359
10 Answerset.isFirst GECKO:0000052 0.119913
11 Answerset.visitNumber GECKO:0000052 0.122284
12 PhysicalExercise.code GECKO:0000052 0.375427
13 ProfessionalSportPast.code GECKO:0000052 0.375427
14 ProfessionalSport.code GECKO:0000052 0.375427
15 SpareTimeActivities.childcarePerWeek GECKO:0000052 0.155838
16 SpareTimeActivities.elderlyCarePerWeek GECKO:0000052 0.134239
17 FemaleHealth.menstruationsStopReason GECKO:0000052 0.104425
18 HormonalContraceptiveUsed.code GECKO:0000052 0.375427
19 HormonalContraceptiveUsedV1.code GECKO:0000052 0.375427
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv -s build/intermediate/cogs_mapping_suggestions_zooma.tsv -s build/intermediate/cogs_mapping_suggestions_nlp.tsv -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
term ... confidence
0 gender ... 1.000000
1 height ... 1.000000
2 age ... 1.000000
3 death date ... 0.760000
4 death date ... 0.760000
5 death date ... 0.760000
6 Nationality ... 1.000000
7 Nationality nationality ... 0.980000
8 Smoking ... 1.000000
9 Alcohol ... 1.000000
10 Tumors ... 0.760000
11 Tumors ... 0.760000
12 Diabetes ... 1.000000
0 ObjectiveInformation.weight ... 0.837672
1 ObjectiveInformation.bmi ... 0.670602
2 ObjectiveInformation.waist ... 0.266035
3 ObjectiveInformation.hip ... 0.273149
4 height ... 0.683658
5 ObjectiveInformation.height ... 0.683658
6 PhysicalExercise.code ... 0.399442
[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
Term ID ... Comment
0 EBB:0000001 ... NaN
1 EBB:0000002 ... NaN
2 EBB:0000003 ... NaN
3 EBB:0000004 ... NaN
4 EBB:0000005 ... NaN
5 EBB:0000006 ... NaN
6 EBB:0000007 ... NaN
7 EBB:0000008 ... NaN
8 EBB:0000009 ... NaN
9 EBB:0000010 ... NaN
10 EBB:0000011 ... Person's last measurements, smoking status and...
11 EBB:0000012 ... Person's last measurements, smoking status and...
12 EBB:0000013 ... Person's last measurements, smoking status and...
13 EBB:0000014 ... Person's last measurements, smoking status and...
14 EBB:0000015 ... Person's last measurements, smoking status and...
15 EBB:0000016 ... Person's last measurements, smoking status and...
16 EBB:0000017 ... Person's last measurements, smoking status and...
17 EBB:0000018 ... Person's last measurements, smoking status and...
18 EBB:0000019 ... Person's last measurements, smoking status and...
19 EBB:0000020 ... Person's last measurements, smoking status and...
[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
java -jar build/robot.jar --prefixes src/prefixes.json export \
--input build/gecko.owl \
--header "LABEL" \
--export build/gecko_labels.tsv
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push
Success