DROID / IHCC-cohorts-data-harmonization-test / EBD

No remote found



Console

Action automated_mapping started at 2022-11-21T12:46:03.960Z (2022-11-21T12:46:03.960Z)

Success

$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EBD
mkdir -p build/intermediate
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                     term          match  confidence
0                  Person  GECKO:0000066        0.76
1                  Person  GECKO:0000055        0.76
2                  Person  GECKO:0000120        0.76
3                  Sample  GECKO:0000052        0.98
4             Nationality  GECKO:0000064        1.00
5               Education  GECKO:0000065        1.00
6     Physical Activities  GECKO:0000104        1.00
7           Eating Habits  GECKO:0000072        0.98
8                 Smoking  GECKO:0000068        1.00
9                 Alcohol  GECKO:0000069        1.00
10                  Sleep  GECKO:0000071        0.98
11                 Health  GECKO:0000126        0.98
12  Objective Information  GECKO:0000114        0.98
curl -L -o build/gecko.owl http://purl.obolibrary.org/obo/gecko/views/ihcc-gecko.owl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   349  100   349    0     0   1928      0 --:--:-- --:--:-- --:--:--  1928

100  105k  100  105k    0     0   223k      0 --:--:-- --:--:-- --:--:--  223k
curl -Lk -o build/robot.jar https://build.obolibrary.io/job/ontodev/job/robot/job/master/lastSuccessfulBuild/artifact/bin/robot.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  1 77.1M    1 1456k    0     0   984k      0  0:01:20  0:00:01  0:01:19  983k
 24 77.1M   24 18.8M    0     0  7720k      0  0:00:10  0:00:02  0:00:08 7717k
 48 77.1M   48 37.6M    0     0  11.0M      0  0:00:07  0:00:03  0:00:04 11.0M
 75 77.1M   75 58.0M    0     0  13.1M      0  0:00:05  0:00:04  0:00:01 13.1M
100 77.1M  100 77.1M    0     0  14.3M      0  0:00:05  0:00:05 --:--:-- 15.6M
java -jar build/robot.jar --prefixes src/prefixes.json query --input build/gecko.owl --query src/queries/ihcc-mapping-gecko.sparql build/intermediate/gecko-xrefs-sparql.csv
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                   Label
0  EBD:0000001                  Person
1  EBD:0000002         Person Portrait
2  EBD:0000003  Diagnosis Consolidated
3  EBD:0000004                  Sample
4  EBD:0000005        Informed Consent
NLP matching successful. First twenty results:
                            term          match  confidence
0                         Sample  GECKO:0000052    0.199025
1                         Health  GECKO:0000052    0.105093
2                      Education  GECKO:0000065    0.169910
3                         Person  GECKO:0000066    0.246904
4                Person Portrait  GECKO:0000066    0.246904
5                        Smoking  GECKO:0000068    0.870167
6                        Alcohol  GECKO:0000069    0.900000
7                          Sleep  GECKO:0000071    0.252353
8                  Eating Habits  GECKO:0000072    0.278889
9                    Other Drugs  GECKO:0000072    0.136990
10           Regular Medications  GECKO:0000072    0.156956
11                   Other Drugs  GECKO:0000093    0.105215
12  Medications Used For Disease  GECKO:0000093    0.252179
13           Regular Medications  GECKO:0000093    0.117252
14           Physical Activities  GECKO:0000104    0.197033
15           Donor Birth Methods  GECKO:0000114    0.147404
16                Place Of Birth  GECKO:0000114    0.272022
17                          Work  GECKO:0000131    0.115732
18                 Mother Health  GECKO:0000132    0.476819
19        Diagnosis Consolidated  MONDO:0000001    0.143600
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                   term          match  confidence
0                Person  GECKO:0000066        0.76
1                Person  GECKO:0000055        0.76
2                Person  GECKO:0000120        0.76
3                Person  GECKO:0000066        0.76
4                Person  GECKO:0000055        0.76
5                Person  GECKO:0000120        0.76
6                Sample  GECKO:0000052        0.98
7                Sample  GECKO:0000052        0.98
8           Nationality  GECKO:0000064        1.00
9           Nationality  GECKO:0000064        1.00
10            Education  GECKO:0000065        1.00
11            Education  GECKO:0000065        1.00
12  Physical Activities  GECKO:0000104        1.00
13  Physical Activities  GECKO:0000104        1.00
14        Eating Habits  GECKO:0000072        0.98
15        Eating Habits  GECKO:0000072        0.98
16              Smoking  GECKO:0000068        1.00
17              Smoking  GECKO:0000068        1.00
18              Alcohol  GECKO:0000069        1.00
19              Alcohol  GECKO:0000069        1.00
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                   Label
0  EBD:0000001                  Person
1  EBD:0000002         Person Portrait
2  EBD:0000003  Diagnosis Consolidated
3  EBD:0000004                  Sample
4  EBD:0000005        Informed Consent
NLP matching successful. First twenty results:
                            term          match  confidence
0                         Sample  GECKO:0000052    0.197785
1                         Health  GECKO:0000052    0.104552
2                      Education  GECKO:0000065    0.170096
3                         Person  GECKO:0000066    0.249028
4                Person Portrait  GECKO:0000066    0.249028
5                        Smoking  GECKO:0000068    0.867390
6                        Alcohol  GECKO:0000069    0.900000
7                          Sleep  GECKO:0000071    0.253743
8               Informed Consent  GECKO:0000072    0.100264
9                  Eating Habits  GECKO:0000072    0.278638
10                   Other Drugs  GECKO:0000072    0.136942
11           Regular Medications  GECKO:0000072    0.157802
12                   Other Drugs  GECKO:0000093    0.106156
13  Medications Used For Disease  GECKO:0000093    0.250168
14           Regular Medications  GECKO:0000093    0.117184
15           Physical Activities  GECKO:0000104    0.193678
16           Donor Birth Methods  GECKO:0000114    0.147944
17                Place Of Birth  GECKO:0000114    0.274689
18                          Work  GECKO:0000131    0.113861
19                 Mother Health  GECKO:0000132    0.473953
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                     term          match  confidence
0                  Person  GECKO:0000066        0.76
1                  Person  GECKO:0000055        0.76
2                  Person  GECKO:0000120        0.76
3                  Sample  GECKO:0000052        0.98
4             Nationality  GECKO:0000064        1.00
5               Education  GECKO:0000065        1.00
6     Physical Activities  GECKO:0000104        1.00
7           Eating Habits  GECKO:0000072        0.98
8                 Smoking  GECKO:0000068        1.00
9                 Alcohol  GECKO:0000069        1.00
10                  Sleep  GECKO:0000071        0.98
11                 Health  GECKO:0000126        0.98
12  Objective Information  GECKO:0000114        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                   Label
0  EBD:0000001                  Person
1  EBD:0000002         Person Portrait
2  EBD:0000003  Diagnosis Consolidated
3  EBD:0000004                  Sample
4  EBD:0000005        Informed Consent
NLP matching successful. First twenty results:
                            term          match  confidence
0                         Sample  GECKO:0000052    0.198378
1                         Health  GECKO:0000052    0.106021
2                      Education  GECKO:0000065    0.168180
3                         Person  GECKO:0000066    0.244266
4                Person Portrait  GECKO:0000066    0.244266
5                        Smoking  GECKO:0000068    0.868231
6                        Alcohol  GECKO:0000069    0.900000
7                          Sleep  GECKO:0000071    0.253284
8                  Eating Habits  GECKO:0000072    0.274467
9                    Other Drugs  GECKO:0000072    0.136972
10           Regular Medications  GECKO:0000072    0.156687
11                   Other Drugs  GECKO:0000093    0.105217
12  Medications Used For Disease  GECKO:0000093    0.250133
13           Regular Medications  GECKO:0000093    0.116194
14           Physical Activities  GECKO:0000104    0.195562
15           Donor Birth Methods  GECKO:0000114    0.147419
16                Place Of Birth  GECKO:0000114    0.273065
17                          Work  GECKO:0000131    0.113717
18                 Mother Health  GECKO:0000132    0.477547
19        Diagnosis Consolidated  MONDO:0000001    0.143923
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
                     term  ... confidence
0                  Person  ...   0.760000
1                  Person  ...   0.760000
2                  Person  ...   0.760000
3                  Sample  ...   0.980000
4             Nationality  ...   1.000000
5               Education  ...   1.000000
6     Physical Activities  ...   1.000000
7           Eating Habits  ...   0.980000
8                 Smoking  ...   1.000000
9                 Alcohol  ...   1.000000
10                  Sleep  ...   0.980000
11                 Health  ...   0.980000
12  Objective Information  ...   0.980000
0                  Sample  ...   0.199025
1                  Health  ...   0.105093
2               Education  ...   0.169910
3                  Person  ...   0.246904
4         Person Portrait  ...   0.246904
5                 Smoking  ...   0.870167
6                 Alcohol  ...   0.900000

[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
        Term ID  ... Comment
0   EBD:0000001  ...     NaN
1   EBD:0000002  ...     NaN
2   EBD:0000003  ...     NaN
3   EBD:0000004  ...     NaN
4   EBD:0000005  ...     NaN
5   EBD:0000006  ...     NaN
6   EBD:0000007  ...     NaN
7   EBD:0000008  ...     NaN
8   EBD:0000009  ...     NaN
9   EBD:0000010  ...     NaN
10  EBD:0000011  ...     NaN
11  EBD:0000012  ...     NaN
12  EBD:0000013  ...     NaN
13  EBD:0000014  ...     NaN
14  EBD:0000015  ...     NaN
15  EBD:0000016  ...     NaN
16  EBD:0000017  ...     NaN
17  EBD:0000018  ...     NaN
18  EBD:0000019  ...     NaN
19  EBD:0000020  ...     NaN

[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
java -jar build/robot.jar --prefixes src/prefixes.json export \
--input build/gecko.owl \
--header "LABEL" \
--export build/gecko_labels.tsv
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push

Success