DROID / IHCC-cohorts-data-harmonization-test / EBB

No remote found



Console

Action automated_mapping started at 2022-11-18T16:02:29.724Z (2022-11-18T16:02:29.724Z)

Success

$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EBB
mkdir -p build/intermediate
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                       term           match  confidence
0                    gender   GECKO:0000060        1.00
1                    height     CMO:0000106        1.00
2                       age    PATO:0000011        1.00
3                death date   STATO:0000093        0.76
4                death date   STATO:0000093        0.76
5                death date  UBERON:0000105        0.76
6               Nationality   GECKO:0000064        1.00
7   Nationality nationality   GECKO:0000064        0.98
8                   Smoking   GECKO:0000068        1.00
9                   Alcohol   GECKO:0000069        1.00
10                   Tumors   MONDO:0004992        0.76
11                   Tumors   MONDO:0005039        0.76
12                 Diabetes   MONDO:0005151        1.00
curl -L -o build/gecko.owl http://purl.obolibrary.org/obo/gecko/views/ihcc-gecko.owl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   349  100   349    0     0   1745      0 --:--:-- --:--:-- --:--:--  1745

100  105k  100  105k    0     0   232k      0 --:--:-- --:--:-- --:--:--  232k
curl -Lk -o build/robot.jar https://build.obolibrary.io/job/ontodev/job/robot/job/master/lastSuccessfulBuild/artifact/bin/robot.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  2 77.1M    2 1590k    0     0  1067k      0  0:01:14  0:00:01  0:01:13 1066k
 25 77.1M   25 19.8M    0     0  8374k      0  0:00:09  0:00:02  0:00:07 8374k
 49 77.1M   49 38.5M    0     0  11.3M      0  0:00:06  0:00:03  0:00:03 11.3M
 76 77.1M   76 59.1M    0     0  13.4M      0  0:00:05  0:00:04  0:00:01 13.4M
100 77.1M  100 77.1M    0     0  14.5M      0  0:00:05  0:00:05 --:--:-- 15.9M
java -jar build/robot.jar --prefixes src/prefixes.json query --input build/gecko.owl --query src/queries/ihcc-mapping-gecko.sparql build/intermediate/gecko-xrefs-sparql.csv
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                 Label
0  EBB:0000001          Person.skood
1  EBB:0000002         Person.gender
2  EBB:0000003      Person.birthDate
3  EBB:0000004      Person.birthYear
4  EBB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                    term          match  confidence
0            ObjectiveInformation.weight    CMO:0000012    0.837672
1               ObjectiveInformation.bmi    CMO:0000021    0.670602
2             ObjectiveInformation.waist    CMO:0000021    0.266035
3               ObjectiveInformation.hip    CMO:0000021    0.273149
4                                 height    CMO:0000106    0.683658
5            ObjectiveInformation.height    CMO:0000106    0.683658
6                  PhysicalExercise.code  GECKO:0000052    0.399442
7             ProfessionalSportPast.code  GECKO:0000052    0.399442
8                 ProfessionalSport.code  GECKO:0000052    0.399442
9         HormonalContraceptiveUsed.code  GECKO:0000052    0.399442
10      HormonalContraceptiveUsedV1.code  GECKO:0000052    0.399442
11      HormonalMedicationMenopause.code  GECKO:0000052    0.399442
12    HormonalMedicationMenopauseV1.code  GECKO:0000052    0.399442
13                       Health.movement  GECKO:0000052    0.104909
14                       Health.selfcare  GECKO:0000052    0.104909
15               Health.commonActivities  GECKO:0000052    0.104909
16                 Health.painDiscomfort  GECKO:0000052    0.104909
17              Health.anxietyDepression  GECKO:0000052    0.104909
18  MedicationsForTroubledBreathing.code  GECKO:0000052    0.399442
19                DiseasesDiagnosed.code  GECKO:0000052    0.399442
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                       term           match  confidence
0             Person.gender   GECKO:0000066        0.98
1                    gender   GECKO:0000060        1.00
2                    gender   GECKO:0000060        1.00
3                    height     CMO:0000106        1.00
4                    height     CMO:0000106        1.00
5                       age    PATO:0000011        1.00
6                       age    PATO:0000011        1.00
7                death date   STATO:0000093        0.76
8                death date   STATO:0000093        0.76
9                death date  UBERON:0000105        0.76
10               death date   STATO:0000093        0.76
11               death date   STATO:0000093        0.76
12               death date  UBERON:0000105        0.76
13              Nationality   GECKO:0000064        1.00
14              Nationality   GECKO:0000064        1.00
15  Nationality.nationality   GECKO:0000064        0.98
16  Nationality nationality   GECKO:0000064        0.98
17  Nationality nationality   GECKO:0000064        0.98
18                  Smoking   GECKO:0000068        1.00
19                  Smoking   GECKO:0000068        1.00
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                 Label
0  EBB:0000001          Person.skood
1  EBB:0000002         Person.gender
2  EBB:0000003      Person.birthDate
3  EBB:0000004      Person.birthYear
4  EBB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                    term          match  confidence
0            ObjectiveInformation.weight    CMO:0000012    0.836668
1               ObjectiveInformation.bmi    CMO:0000021    0.669178
2             ObjectiveInformation.waist    CMO:0000021    0.268965
3               ObjectiveInformation.hip    CMO:0000021    0.274443
4                                 height    CMO:0000106    0.685696
5            ObjectiveInformation.height    CMO:0000106    0.685696
6                  PhysicalExercise.code  GECKO:0000052    0.401939
7             ProfessionalSportPast.code  GECKO:0000052    0.401939
8                 ProfessionalSport.code  GECKO:0000052    0.401939
9         HormonalContraceptiveUsed.code  GECKO:0000052    0.401939
10      HormonalContraceptiveUsedV1.code  GECKO:0000052    0.401939
11      HormonalMedicationMenopause.code  GECKO:0000052    0.401939
12    HormonalMedicationMenopauseV1.code  GECKO:0000052    0.401939
13                       Health.movement  GECKO:0000052    0.104880
14                       Health.selfcare  GECKO:0000052    0.104880
15               Health.commonActivities  GECKO:0000052    0.104880
16                 Health.painDiscomfort  GECKO:0000052    0.104880
17              Health.anxietyDepression  GECKO:0000052    0.104880
18  MedicationsForTroubledBreathing.code  GECKO:0000052    0.401939
19                DiseasesDiagnosed.code  GECKO:0000052    0.401939
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                                           term          match  confidence
0                                 Person.gender  GECKO:0000060        1.00
1                              Person.birthDate   PATO:0000011        0.98
2                              Person.birthYear  GECKO:0000066        0.98
3                              Person.deathDate  STATO:0000093        1.00
4                       Nationality.nationality  GECKO:0000064        1.00
5                            Smoking has Smoked  GECKO:0000064        1.00
6                           Alcohol_Consumption  GECKO:0000064        1.00
7                                        Tumors  MONDO:0004992        0.76
8                                        Tumors  MONDO:0005039        0.76
9                                      Diabetes  MONDO:0005151        1.00
10                      Nationality.nationality  GECKO:0000064        1.00
11          SpareTimeActivities.shoppingPerWeek  GECKO:0000131        0.76
12          SpareTimeActivities.shoppingPerWeek  GECKO:0000104        0.76
13          SpareTimeActivities.cleaningPerWeek  GECKO:0000052        0.76
14          SpareTimeActivities.cleaningPerWeek  MONDO:0004992        0.76
15          SpareTimeActivities.cleaningPerWeek  GECKO:0000060        0.76
16  SpareTimeActivities.physicalExercisePerWeek   OGMS:0000020        0.98
17           SpareTimeActivities.readingPerWeek    CMO:0000294        0.76
18           SpareTimeActivities.readingPerWeek    CMO:0000003        0.76
19                TobaccoLast12Months.smokeProd  GECKO:0000068        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
       Term ID                 Label
0  EBB:0000001          Person.skood
1  EBB:0000002         Person.gender
2  EBB:0000003      Person.birthDate
3  EBB:0000004      Person.birthYear
4  EBB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                      term          match  confidence
0                PersonPortrait.lastWeight    CMO:0000012    0.240447
1              ObjectiveInformation.weight    CMO:0000012    0.777004
2                   PersonPortrait.lastBmi    CMO:0000021    0.158019
3                                   height    CMO:0000021    0.110494
4                 ObjectiveInformation.bmi    CMO:0000021    0.622844
5               ObjectiveInformation.waist    CMO:0000021    0.246794
6                 ObjectiveInformation.hip    CMO:0000021    0.251361
7                PersonPortrait.lastHeight    CMO:0000106    0.136501
8              ObjectiveInformation.height    CMO:0000106    0.629629
9                                   height  GECKO:0000052    0.102359
10                       Answerset.isFirst  GECKO:0000052    0.119913
11                   Answerset.visitNumber  GECKO:0000052    0.122284
12                   PhysicalExercise.code  GECKO:0000052    0.375427
13              ProfessionalSportPast.code  GECKO:0000052    0.375427
14                  ProfessionalSport.code  GECKO:0000052    0.375427
15    SpareTimeActivities.childcarePerWeek  GECKO:0000052    0.155838
16  SpareTimeActivities.elderlyCarePerWeek  GECKO:0000052    0.134239
17    FemaleHealth.menstruationsStopReason  GECKO:0000052    0.104425
18          HormonalContraceptiveUsed.code  GECKO:0000052    0.375427
19        HormonalContraceptiveUsedV1.code  GECKO:0000052    0.375427
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
                           term  ... confidence
0                        gender  ...   1.000000
1                        height  ...   1.000000
2                           age  ...   1.000000
3                    death date  ...   0.760000
4                    death date  ...   0.760000
5                    death date  ...   0.760000
6                   Nationality  ...   1.000000
7       Nationality nationality  ...   0.980000
8                       Smoking  ...   1.000000
9                       Alcohol  ...   1.000000
10                       Tumors  ...   0.760000
11                       Tumors  ...   0.760000
12                     Diabetes  ...   1.000000
0   ObjectiveInformation.weight  ...   0.837672
1      ObjectiveInformation.bmi  ...   0.670602
2    ObjectiveInformation.waist  ...   0.266035
3      ObjectiveInformation.hip  ...   0.273149
4                        height  ...   0.683658
5   ObjectiveInformation.height  ...   0.683658
6         PhysicalExercise.code  ...   0.399442

[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
        Term ID  ...                                            Comment
0   EBB:0000001  ...                                                NaN
1   EBB:0000002  ...                                                NaN
2   EBB:0000003  ...                                                NaN
3   EBB:0000004  ...                                                NaN
4   EBB:0000005  ...                                                NaN
5   EBB:0000006  ...                                                NaN
6   EBB:0000007  ...                                                NaN
7   EBB:0000008  ...                                                NaN
8   EBB:0000009  ...                                                NaN
9   EBB:0000010  ...                                                NaN
10  EBB:0000011  ...  Person's last measurements, smoking status and...
11  EBB:0000012  ...  Person's last measurements, smoking status and...
12  EBB:0000013  ...  Person's last measurements, smoking status and...
13  EBB:0000014  ...  Person's last measurements, smoking status and...
14  EBB:0000015  ...  Person's last measurements, smoking status and...
15  EBB:0000016  ...  Person's last measurements, smoking status and...
16  EBB:0000017  ...  Person's last measurements, smoking status and...
17  EBB:0000018  ...  Person's last measurements, smoking status and...
18  EBB:0000019  ...  Person's last measurements, smoking status and...
19  EBB:0000020  ...  Person's last measurements, smoking status and...

[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
java -jar build/robot.jar --prefixes src/prefixes.json export \
--input build/gecko.owl \
--header "LABEL" \
--export build/gecko_labels.tsv
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push

Success