DROID / IHCC-cohorts-data-harmonization-test / EBPERIOD

No remote found



Console

Action automated_mapping started at 2022-11-23T12:56:50.116Z (2022-11-23T12:56:50.116Z)

Success

$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EB
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
          term          match  confidence
0       Person  GECKO:0000066        0.76
1       Person  GECKO:0000055        0.76
2       Person  GECKO:0000120        0.76
3       Sample  GECKO:0000052        0.98
4  Nationality  GECKO:0000064        1.00
5    Education  GECKO:0000065        1.00
6      Smoking  GECKO:0000068        1.00
7      Alcohol  GECKO:0000069        1.00
8        Sleep  GECKO:0000071        0.98
9       Health  GECKO:0000126        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                    term          match  confidence
0            ObjectiveInformation.weight    CMO:0000012    0.839915
1               ObjectiveInformation.bmi    CMO:0000021    0.672958
2             ObjectiveInformation.waist    CMO:0000021    0.267822
3               ObjectiveInformation.hip    CMO:0000021    0.271177
4            ObjectiveInformation.height    CMO:0000106    0.687485
5                           Sample.vkood  GECKO:0000052    0.196984
6                     Sample.visitNumber  GECKO:0000052    0.196984
7                  PhysicalExercise.code  GECKO:0000052    0.404677
8             ProfessionalSportPast.code  GECKO:0000052    0.404677
9                 ProfessionalSport.code  GECKO:0000052    0.404677
10        HormonalContraceptiveUsed.code  GECKO:0000052    0.404677
11      HormonalContraceptiveUsedV1.code  GECKO:0000052    0.404677
12      HormonalMedicationMenopause.code  GECKO:0000052    0.404677
13    HormonalMedicationMenopauseV1.code  GECKO:0000052    0.404677
14                       Health.movement  GECKO:0000052    0.104620
15                       Health.selfcare  GECKO:0000052    0.104620
16               Health.commonActivities  GECKO:0000052    0.104620
17                 Health.painDiscomfort  GECKO:0000052    0.104620
18              Health.anxietyDepression  GECKO:0000052    0.104620
19  MedicationsForTroubledBreathing.code  GECKO:0000052    0.404677
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                               term           match  confidence
0                     Person.gender   GECKO:0000066        0.98
1           Nationality.nationality   GECKO:0000064        0.98
2   Education.highestEducationLevel   GECKO:0000065        0.76
3   Education.highestEducationLevel  UBERON:0000105        0.76
4   Education.highestEducationLevel   GECKO:0000065        0.76
5          EatingHabits.eatingHabit   GECKO:0000072        0.98
6             Smoking.smokingStatus   GECKO:0000068        0.98
7        OtherDrugs.usingOtherDrugs   GECKO:0000094        0.98
8             OtherDrugs.otherDrugs   GECKO:0000094        0.98
9       ObjectiveInformation.weight   GECKO:0000114        0.98
10                           Person   GECKO:0000066        0.76
11                           Person   GECKO:0000055        0.76
12                           Person   GECKO:0000120        0.76
13                           Person   GECKO:0000066        0.76
14                           Person   GECKO:0000055        0.76
15                           Person   GECKO:0000120        0.76
16                           Sample   GECKO:0000052        0.98
17                           Sample   GECKO:0000052        0.98
18                      Nationality   GECKO:0000064        1.00
19                      Nationality   GECKO:0000064        1.00
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                                 term  ... confidence
0                      VisualDecadence.leftEyeDioptry  ...   0.113719
1   CardiovascularDiseasesAdditional.leftVentricle...  ...   0.375941
2                           PersonPortrait.lastWeight  ...   0.286969
3                         ObjectiveInformation.weight  ...   0.166849
4              ObjectiveInformation.weightMeasurement  ...   0.128142
5                              PersonPortrait.lastBmi  ...   0.180271
6                              PersonPortrait.bmiDate  ...   0.179665
7                            PersonPortrait.bmiSource  ...   0.163099
8                            ObjectiveInformation.bmi  ...   0.132098
9               ObjectiveInformation.armCircumference  ...   0.129445
10           ObjectiveInformation.waistHipMeasurement  ...   0.188727
11                          PersonPortrait.lastHeight  ...   0.172090
12                        ObjectiveInformation.height  ...   0.101991
13                           PersonPortrait.bmiSource  ...   0.122198
14                       PersonPortrait.smokingSource  ...   0.100377
15                     PersonPortrait.educationSource  ...   0.132969
16                PersonPortrait.settlementRegionType  ...   0.171467
17                                       Sample.vkood  ...   0.175970
18                                 Sample.visitNumber  ...   0.177679
19                             InformedConsent.icDate  ...   0.132688

[20 rows x 3 columns]
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p HIERARCHY -o build/intermediate/cogs_mapping_suggestions_zooma_hierarchy.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                           term          match  confidence
0                 Person.gender  GECKO:0000060        1.00
1              Person.birthDate   PATO:0000011        0.98
2    PersonPortrait.nationality  GECKO:0000064        1.00
3   DiagnosisConsolidated.icd10  MONDO:0004992        0.76
4   DiagnosisConsolidated.icd10  MONDO:0005084        0.76
5   DiagnosisConsolidated.icd10  MONDO:0000001        0.76
6   DiagnosisConsolidated.icd10  MONDO:0004995        0.76
7   DiagnosisConsolidated.icd10  GECKO:0000052        0.76
8       Nationality.nationality  GECKO:0000064        1.00
9         PhysicalExercise.code  GECKO:0000073        0.76
10        PhysicalExercise.code  GECKO:0000064        0.76
11        PhysicalExercise.code  GECKO:0000052        0.76
12        PhysicalExercise.code  MONDO:0000001        0.76
13        PhysicalExercise.code  GECKO:0000060        0.76
14   ProfessionalSportPast.code  GECKO:0000073        0.76
15   ProfessionalSportPast.code  GECKO:0000064        0.76
16   ProfessionalSportPast.code  GECKO:0000052        0.76
17   ProfessionalSportPast.code  MONDO:0000001        0.76
18   ProfessionalSportPast.code  GECKO:0000060        0.76
19       ProfessionalSport.code  GECKO:0000073        0.76
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p HIERARCHY -o build/intermediate/cogs_mapping_suggestions_nlp_hierarchy.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                     term          match  confidence
0             ObjectiveInformation.weight    CMO:0000012    0.838501
1                ObjectiveInformation.bmi    CMO:0000021    0.673675
2              ObjectiveInformation.waist    CMO:0000021    0.266746
3                ObjectiveInformation.hip    CMO:0000021    0.276154
4             ObjectiveInformation.height    CMO:0000106    0.689157
5                   PhysicalExercise.code  GECKO:0000052    0.403723
6              ProfessionalSportPast.code  GECKO:0000052    0.403723
7                  ProfessionalSport.code  GECKO:0000052    0.403723
8          HormonalContraceptiveUsed.code  GECKO:0000052    0.403723
9        HormonalContraceptiveUsedV1.code  GECKO:0000052    0.403723
10       HormonalMedicationMenopause.code  GECKO:0000052    0.403723
11     HormonalMedicationMenopauseV1.code  GECKO:0000052    0.403723
12   MedicationsForTroubledBreathing.code  GECKO:0000052    0.403723
13                 DiseasesDiagnosed.code  GECKO:0000052    0.403723
14         MedicationsUsedForDisease.code  GECKO:0000052    0.403723
15  MedicationPackagesUsedForDisease.code  GECKO:0000052    0.403723
16               ConcurrentDiagnoses.code  GECKO:0000052    0.403723
17    RespiratoryDiseasesMedications.code  GECKO:0000052    0.403723
18             DiabetesComplications.code  GECKO:0000052    0.403723
19               DiabetesMedications.code  GECKO:0000052    0.403723
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                                           term          match  confidence
0                                 Person.gender  GECKO:0000060        1.00
1                              Person.birthDate   PATO:0000011        0.98
2                              Person.birthYear  GECKO:0000066        0.98
3                              Person.deathDate  STATO:0000093        1.00
4                    PersonPortrait.nationality  GECKO:0000064        1.00
5                  PersonPortrait.lastEducation  GECKO:0000065        0.98
6                PersonPortrait.residencyRegion  GECKO:0000064        1.00
7                       Nationality.nationality  GECKO:0000064        1.00
8           SpareTimeActivities.shoppingPerWeek  GECKO:0000131        0.76
9           SpareTimeActivities.shoppingPerWeek  GECKO:0000104        0.76
10          SpareTimeActivities.cleaningPerWeek  GECKO:0000052        0.76
11          SpareTimeActivities.cleaningPerWeek  MONDO:0004992        0.76
12          SpareTimeActivities.cleaningPerWeek  GECKO:0000060        0.76
13  SpareTimeActivities.physicalExercisePerWeek   OGMS:0000020        0.98
14           SpareTimeActivities.readingPerWeek    CMO:0000294        0.76
15           SpareTimeActivities.readingPerWeek    CMO:0000003        0.76
16                TobaccoLast12Months.smokeProd  GECKO:0000068        0.98
17                     TobaccoUsually.smokeProd  GECKO:0000068        0.98
18                   TobaccoLastMonth.smokeProd  GECKO:0000068        0.98
19          TobaccoYearBeforeQuitting.smokeProd  GECKO:0000068        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                      term          match  confidence
0                PersonPortrait.lastWeight    CMO:0000012    0.237716
1              ObjectiveInformation.weight    CMO:0000012    0.776403
2                   PersonPortrait.lastBmi    CMO:0000021    0.156668
3                 PersonPortrait.bmiSource    CMO:0000021    0.109631
4                 ObjectiveInformation.bmi    CMO:0000021    0.619491
5               ObjectiveInformation.waist    CMO:0000021    0.245984
6                 ObjectiveInformation.hip    CMO:0000021    0.249478
7                PersonPortrait.lastHeight    CMO:0000106    0.137774
8              ObjectiveInformation.height    CMO:0000106    0.632686
9                 PersonPortrait.bmiSource  GECKO:0000052    0.101925
10     PersonPortrait.settlementRegionType  GECKO:0000052    0.208364
11                            Sample.vkood  GECKO:0000052    0.182295
12                      Sample.visitNumber  GECKO:0000052    0.182295
13                       Answerset.isFirst  GECKO:0000052    0.118335
14                   Answerset.visitNumber  GECKO:0000052    0.120449
15                   PhysicalExercise.code  GECKO:0000052    0.370965
16              ProfessionalSportPast.code  GECKO:0000052    0.370965
17                  ProfessionalSport.code  GECKO:0000052    0.370965
18    SpareTimeActivities.childcarePerWeek  GECKO:0000052    0.153579
19  SpareTimeActivities.elderlyCarePerWeek  GECKO:0000052    0.132282
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_hierarchy.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_hierarchy.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_hierarchy.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_hierarchy.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
                          term  ... confidence
0                       Person  ...   0.760000
1                       Person  ...   0.760000
2                       Person  ...   0.760000
3                       Sample  ...   0.980000
4                  Nationality  ...   1.000000
5                    Education  ...   1.000000
6                      Smoking  ...   1.000000
7                      Alcohol  ...   1.000000
8                        Sleep  ...   0.980000
9                       Health  ...   0.980000
0  ObjectiveInformation.weight  ...   0.839915
1     ObjectiveInformation.bmi  ...   0.672958
2   ObjectiveInformation.waist  ...   0.267822
3     ObjectiveInformation.hip  ...   0.271177
4  ObjectiveInformation.height  ...   0.687485
5                 Sample.vkood  ...   0.196984
6           Sample.visitNumber  ...   0.196984
7        PhysicalExercise.code  ...   0.404677
8   ProfessionalSportPast.code  ...   0.404677
9       ProfessionalSport.code  ...   0.404677

[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
       Term ID  ...                                            Comment
0   EB:0000001  ...                                                NaN
1   EB:0000002  ...                                                NaN
2   EB:0000003  ...                                                NaN
3   EB:0000004  ...                                                NaN
4   EB:0000005  ...                                                NaN
5   EB:0000006  ...                                                NaN
6   EB:0000007  ...                                                NaN
7   EB:0000008  ...                                                NaN
8   EB:0000009  ...                                                NaN
9   EB:0000010  ...                                                NaN
10  EB:0000011  ...                                                NaN
11  EB:0000012  ...  Person's last measurements, smoking status and...
12  EB:0000013  ...  Person's last measurements, smoking status and...
13  EB:0000014  ...  Person's last measurements, smoking status and...
14  EB:0000015  ...  Person's last measurements, smoking status and...
15  EB:0000016  ...  Person's last measurements, smoking status and...
16  EB:0000017  ...  Person's last measurements, smoking status and...
17  EB:0000018  ...  Person's last measurements, smoking status and...
18  EB:0000019  ...  Person's last measurements, smoking status and...
19  EB:0000020  ...  Person's last measurements, smoking status and...

[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
ERROR:root:'disease or disorder' suggested on row 32 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 85 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 92 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 94 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 265 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 266 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 267 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 268 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 332 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 335 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 339 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 341 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 342 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 359 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 371 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 373 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 374 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 397 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 405 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 465 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 466 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 467 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 468 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 469 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 470 is not a GECKO term
ERROR:root:'disease or disorder' suggested on row 475 is not a GECKO term
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push

Success