DROID / IHCC-cohorts-data-harmonization-test / EBC

No remote found



Console

Action automated_mapping started at 2022-11-21T12:35:49.181Z (2022-11-21T12:35:49.181Z)

Success

$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EBC
mkdir -p build/intermediate
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
          term          match  confidence
0       Person  GECKO:0000066        0.76
1       Person  GECKO:0000055        0.76
2       Person  GECKO:0000120        0.76
3       Sample  GECKO:0000052        0.98
4  Nationality  GECKO:0000064        1.00
5    Education  GECKO:0000065        1.00
6      Smoking  GECKO:0000068        1.00
7      Alcohol  GECKO:0000069        1.00
8        Sleep  GECKO:0000071        0.98
9       Health  GECKO:0000126        0.98
curl -L -o build/gecko.owl http://purl.obolibrary.org/obo/gecko/views/ihcc-gecko.owl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   349  100   349    0     0   1876      0 --:--:-- --:--:-- --:--:--  1876

100  105k  100  105k    0     0   235k      0 --:--:-- --:--:-- --:--:--  235k
curl -Lk -o build/robot.jar https://build.obolibrary.io/job/ontodev/job/robot/job/master/lastSuccessfulBuild/artifact/bin/robot.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0 77.1M    0 15455    0     0  25048      0  0:53:50 --:--:--  0:53:50 25008
  3 77.1M    3 2577k    0     0  1584k      0  0:00:49  0:00:01  0:00:48 1583k
 29 77.1M   29 22.9M    0     0  8886k      0  0:00:08  0:00:02  0:00:06 8883k
 54 77.1M   54 41.9M    0     0  11.6M      0  0:00:06  0:00:03  0:00:03 11.6M
 80 77.1M   80 62.1M    0     0  13.4M      0  0:00:05  0:00:04  0:00:01 13.4M
100 77.1M  100 77.1M    0     0  14.3M      0  0:00:05  0:00:05 --:--:-- 16.2M
java -jar build/robot.jar --prefixes src/prefixes.json query --input build/gecko.owl --query src/queries/ihcc-mapping-gecko.sparql build/intermediate/gecko-xrefs-sparql.csv
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                  Label
0  EBC:0000001                 Person
1  EBC:0000002         PersonPortrait
2  EBC:0000003  DiagnosisConsolidated
3  EBC:0000004                 Sample
4  EBC:0000005        InformedConsent
NLP matching successful. First twenty results:
                     term          match  confidence
0                  Sample  GECKO:0000052    0.199399
1                  Health  GECKO:0000052    0.105097
2               Education  GECKO:0000065    0.167599
3                  Person  GECKO:0000066    0.249490
4                 Smoking  GECKO:0000068    0.869039
5                 Alcohol  GECKO:0000069    0.900000
6                   Sleep  GECKO:0000071    0.253313
7                    Work  GECKO:0000131    0.115807
8          PersonPortrait  MONDO:0005084    0.108782
9   DiagnosisConsolidated  MONDO:0005084    0.108782
10        InformedConsent  MONDO:0005084    0.108782
11              Answerset  MONDO:0005084    0.108782
12            Nationality  MONDO:0005084    0.108782
13     PhysicalActivities  MONDO:0005084    0.108782
14           EatingHabits  MONDO:0005084    0.108782
15             OtherDrugs  MONDO:0005084    0.108782
16                  Sleep  MONDO:0005084    0.112482
17           FemaleHealth  MONDO:0005084    0.108782
18           MotherHealth  MONDO:0005084    0.108782
19   ObjectiveInformation  MONDO:0005084    0.108782
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                  term          match  confidence
0               Person  GECKO:0000066        0.76
1               Person  GECKO:0000055        0.76
2               Person  GECKO:0000120        0.76
3               Person  GECKO:0000066        0.76
4               Person  GECKO:0000055        0.76
5               Person  GECKO:0000120        0.76
6               Sample  GECKO:0000052        0.98
7               Sample  GECKO:0000052        0.98
8          Nationality  GECKO:0000064        1.00
9          Nationality  GECKO:0000064        1.00
10           Education  GECKO:0000065        1.00
11           Education  GECKO:0000065        1.00
12  PhysicalActivities  GECKO:0000104        1.00
13        EatingHabits  GECKO:0000072        0.98
14             Smoking  GECKO:0000068        1.00
15             Smoking  GECKO:0000068        1.00
16             Alcohol  GECKO:0000069        1.00
17             Alcohol  GECKO:0000069        1.00
18               Sleep  GECKO:0000071        0.98
19               Sleep  GECKO:0000071        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                  Label
0  EBC:0000001                 Person
1  EBC:0000002         PersonPortrait
2  EBC:0000003  DiagnosisConsolidated
3  EBC:0000004                 Sample
4  EBC:0000005        InformedConsent
NLP matching successful. First twenty results:
                     term          match  confidence
0                  Sample  GECKO:0000052    0.198846
1                  Health  GECKO:0000052    0.104726
2               Education  GECKO:0000065    0.168262
3                  Person  GECKO:0000066    0.247261
4                 Smoking  GECKO:0000068    0.867914
5                 Alcohol  GECKO:0000069    0.900000
6                   Sleep  GECKO:0000071    0.252240
7                    Work  GECKO:0000131    0.115833
8          PersonPortrait  MONDO:0005084    0.109894
9   DiagnosisConsolidated  MONDO:0005084    0.109894
10        InformedConsent  MONDO:0005084    0.109894
11              Answerset  MONDO:0005084    0.109894
12            Nationality  MONDO:0005084    0.109894
13     PhysicalActivities  MONDO:0005084    0.109894
14           EatingHabits  MONDO:0005084    0.109894
15             OtherDrugs  MONDO:0005084    0.109894
16                  Sleep  MONDO:0005084    0.113986
17           FemaleHealth  MONDO:0005084    0.109894
18           MotherHealth  MONDO:0005084    0.109894
19   ObjectiveInformation  MONDO:0005084    0.109894
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
          term          match  confidence
0       Person  GECKO:0000066        0.76
1       Person  GECKO:0000055        0.76
2       Person  GECKO:0000120        0.76
3       Sample  GECKO:0000052        0.98
4  Nationality  GECKO:0000064        1.00
5    Education  GECKO:0000065        1.00
6      Smoking  GECKO:0000068        1.00
7      Alcohol  GECKO:0000069        1.00
8        Sleep  GECKO:0000071        0.98
9       Health  GECKO:0000126        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                  Label
0  EBC:0000001                 Person
1  EBC:0000002         PersonPortrait
2  EBC:0000003  DiagnosisConsolidated
3  EBC:0000004                 Sample
4  EBC:0000005        InformedConsent
NLP matching successful. First twenty results:
                     term          match  confidence
0                  Sample  GECKO:0000052    0.199879
1                  Health  GECKO:0000052    0.104178
2               Education  GECKO:0000065    0.168348
3                  Person  GECKO:0000066    0.249268
4                 Smoking  GECKO:0000068    0.868788
5                 Alcohol  GECKO:0000069    0.900000
6                   Sleep  GECKO:0000071    0.252244
7                    Work  GECKO:0000131    0.114595
8          PersonPortrait  MONDO:0005084    0.109368
9   DiagnosisConsolidated  MONDO:0005084    0.109368
10        InformedConsent  MONDO:0005084    0.109368
11              Answerset  MONDO:0005084    0.109368
12            Nationality  MONDO:0005084    0.109368
13     PhysicalActivities  MONDO:0005084    0.109368
14           EatingHabits  MONDO:0005084    0.109368
15             OtherDrugs  MONDO:0005084    0.109368
16                  Sleep  MONDO:0005084    0.113353
17           FemaleHealth  MONDO:0005084    0.109368
18           MotherHealth  MONDO:0005084    0.109368
19   ObjectiveInformation  MONDO:0005084    0.109368
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
                    term  ... confidence
0                 Person  ...   0.760000
1                 Person  ...   0.760000
2                 Person  ...   0.760000
3                 Sample  ...   0.980000
4            Nationality  ...   1.000000
5              Education  ...   1.000000
6                Smoking  ...   1.000000
7                Alcohol  ...   1.000000
8                  Sleep  ...   0.980000
9                 Health  ...   0.980000
0                 Sample  ...   0.199399
1                 Health  ...   0.105097
2              Education  ...   0.167599
3                 Person  ...   0.249490
4                Smoking  ...   0.869039
5                Alcohol  ...   0.900000
6                  Sleep  ...   0.253313
7                   Work  ...   0.115807
8         PersonPortrait  ...   0.108782
9  DiagnosisConsolidated  ...   0.108782

[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
        Term ID  ... Comment
0   EBC:0000001  ...     NaN
1   EBC:0000002  ...     NaN
2   EBC:0000003  ...     NaN
3   EBC:0000004  ...     NaN
4   EBC:0000005  ...     NaN
5   EBC:0000006  ...     NaN
6   EBC:0000007  ...     NaN
7   EBC:0000008  ...     NaN
8   EBC:0000009  ...     NaN
9   EBC:0000010  ...     NaN
10  EBC:0000011  ...     NaN
11  EBC:0000012  ...     NaN
12  EBC:0000013  ...     NaN
13  EBC:0000014  ...     NaN
14  EBC:0000015  ...     NaN
15  EBC:0000016  ...     NaN
16  EBC:0000017  ...     NaN
17  EBC:0000018  ...     NaN
18  EBC:0000019  ...     NaN
19  EBC:0000020  ...     NaN

[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
java -jar build/robot.jar --prefixes src/prefixes.json export \
--input build/gecko.owl \
--header "LABEL" \
--export build/gecko_labels.tsv
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push

Success