DROID / IHCC-cohorts-data-harmonization-test / EB

No remote found



Console

Press a button above to execute an action.

Exit status of last command unknown. The server may have restarted before it could complete.

$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EB
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
          term          match  confidence
0       Person  GECKO:0000066        0.76
1       Person  GECKO:0000055        0.76
2       Person  GECKO:0000120        0.76
3       Sample  GECKO:0000052        0.98
4  Nationality  GECKO:0000064        1.00
5    Education  GECKO:0000065        1.00
6      Smoking  GECKO:0000068        1.00
7      Alcohol  GECKO:0000069        1.00
8        Sleep  GECKO:0000071        0.98
9       Health  GECKO:0000126        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                    term          match  confidence
0            ObjectiveInformation.weight    CMO:0000012    0.837803
1               ObjectiveInformation.bmi    CMO:0000021    0.672504
2             ObjectiveInformation.waist    CMO:0000021    0.267665
3               ObjectiveInformation.hip    CMO:0000021    0.272348
4            ObjectiveInformation.height    CMO:0000106    0.689633
5                           Sample.vkood  GECKO:0000052    0.199489
6                     Sample.visitNumber  GECKO:0000052    0.199489
7                  PhysicalExercise.code  GECKO:0000052    0.404730
8             ProfessionalSportPast.code  GECKO:0000052    0.404730
9                 ProfessionalSport.code  GECKO:0000052    0.404730
10        HormonalContraceptiveUsed.code  GECKO:0000052    0.404730
11      HormonalContraceptiveUsedV1.code  GECKO:0000052    0.404730
12      HormonalMedicationMenopause.code  GECKO:0000052    0.404730
13    HormonalMedicationMenopauseV1.code  GECKO:0000052    0.404730
14                       Health.movement  GECKO:0000052    0.105289
15                       Health.selfcare  GECKO:0000052    0.105289
16               Health.commonActivities  GECKO:0000052    0.105289
17                 Health.painDiscomfort  GECKO:0000052    0.105289
18              Health.anxietyDepression  GECKO:0000052    0.105289
19  MedicationsForTroubledBreathing.code  GECKO:0000052    0.404730
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                               term           match  confidence
0                     Person.gender   GECKO:0000066        0.98
1           Nationality.nationality   GECKO:0000064        0.98
2   Education.highestEducationLevel   GECKO:0000065        0.76
3   Education.highestEducationLevel  UBERON:0000105        0.76
4   Education.highestEducationLevel   GECKO:0000065        0.76
5          EatingHabits.eatingHabit   GECKO:0000072        0.98
6             Smoking.smokingStatus   GECKO:0000068        0.98
7        OtherDrugs.usingOtherDrugs   GECKO:0000094        0.98
8             OtherDrugs.otherDrugs   GECKO:0000094        0.98
9       ObjectiveInformation.weight   GECKO:0000114        0.98
10                           Person   GECKO:0000066        0.76
11                           Person   GECKO:0000055        0.76
12                           Person   GECKO:0000120        0.76
13                           Person   GECKO:0000066        0.76
14                           Person   GECKO:0000055        0.76
15                           Person   GECKO:0000120        0.76
16                           Sample   GECKO:0000052        0.98
17                           Sample   GECKO:0000052        0.98
18                      Nationality   GECKO:0000064        1.00
19                      Nationality   GECKO:0000064        1.00
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p WORD_BOUNDARY -o build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                    term          match  confidence
0            ObjectiveInformation.weight    CMO:0000012    0.837917
1               ObjectiveInformation.bmi    CMO:0000021    0.672600
2             ObjectiveInformation.waist    CMO:0000021    0.263734
3               ObjectiveInformation.hip    CMO:0000021    0.269210
4            ObjectiveInformation.height    CMO:0000106    0.689339
5                           Sample.vkood  GECKO:0000052    0.199805
6                     Sample.visitNumber  GECKO:0000052    0.199805
7                  PhysicalExercise.code  GECKO:0000052    0.400454
8             ProfessionalSportPast.code  GECKO:0000052    0.400454
9                 ProfessionalSport.code  GECKO:0000052    0.400454
10        HormonalContraceptiveUsed.code  GECKO:0000052    0.400454
11      HormonalContraceptiveUsedV1.code  GECKO:0000052    0.400454
12      HormonalMedicationMenopause.code  GECKO:0000052    0.400454
13    HormonalMedicationMenopauseV1.code  GECKO:0000052    0.400454
14                       Health.movement  GECKO:0000052    0.105161
15                       Health.selfcare  GECKO:0000052    0.105161
16               Health.commonActivities  GECKO:0000052    0.105161
17                 Health.painDiscomfort  GECKO:0000052    0.105161
18              Health.anxietyDepression  GECKO:0000052    0.105161
19  MedicationsForTroubledBreathing.code  GECKO:0000052    0.400454
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv
Zooma config: {'zooma_annotate': 'https://test.mapping.ihccglobal.app/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://test.mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://test.registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://test.registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First twenty results:
                                           term          match  confidence
0                                 Person.gender  GECKO:0000060        1.00
1                              Person.birthDate   PATO:0000011        0.98
2                              Person.birthYear  GECKO:0000066        0.98
3                              Person.deathDate  STATO:0000093        1.00
4                    PersonPortrait.nationality  GECKO:0000064        1.00
5                  PersonPortrait.lastEducation  GECKO:0000065        0.98
6                PersonPortrait.residencyRegion  GECKO:0000064        1.00
7                       Nationality.nationality  GECKO:0000064        1.00
8           SpareTimeActivities.shoppingPerWeek  GECKO:0000131        0.76
9           SpareTimeActivities.shoppingPerWeek  GECKO:0000104        0.76
10          SpareTimeActivities.cleaningPerWeek  GECKO:0000052        0.76
11          SpareTimeActivities.cleaningPerWeek  MONDO:0004992        0.76
12          SpareTimeActivities.cleaningPerWeek  GECKO:0000060        0.76
13  SpareTimeActivities.physicalExercisePerWeek   OGMS:0000020        0.98
14           SpareTimeActivities.readingPerWeek    CMO:0000294        0.76
15           SpareTimeActivities.readingPerWeek    CMO:0000003        0.76
16                TobaccoLast12Months.smokeProd  GECKO:0000068        0.98
17                     TobaccoUsually.smokeProd  GECKO:0000068        0.98
18                   TobaccoLastMonth.smokeProd  GECKO:0000068        0.98
19          TobaccoYearBeforeQuitting.smokeProd  GECKO:0000068        0.98
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -p DEFINITION -o build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
  warnings.warn(
      Term ID                 Label
0  EB:0000001          Person.skood
1  EB:0000002         Person.gender
2  EB:0000003      Person.birthDate
3  EB:0000004      Person.birthYear
4  EB:0000005  Person.agreementDate
NLP matching successful. First twenty results:
                                      term          match  confidence
0                PersonPortrait.lastWeight    CMO:0000012    0.236058
1              ObjectiveInformation.weight    CMO:0000012    0.776128
2                   PersonPortrait.lastBmi    CMO:0000021    0.159147
3                 PersonPortrait.bmiSource    CMO:0000021    0.111170
4                 ObjectiveInformation.bmi    CMO:0000021    0.624578
5               ObjectiveInformation.waist    CMO:0000021    0.246661
6                 ObjectiveInformation.hip    CMO:0000021    0.251470
7                PersonPortrait.lastHeight    CMO:0000106    0.139162
8              ObjectiveInformation.height    CMO:0000106    0.635535
9                 PersonPortrait.bmiSource  GECKO:0000052    0.102550
10     PersonPortrait.settlementRegionType  GECKO:0000052    0.210465
11                            Sample.vkood  GECKO:0000052    0.184984
12                      Sample.visitNumber  GECKO:0000052    0.184984
13                       Answerset.isFirst  GECKO:0000052    0.118963
14                   Answerset.visitNumber  GECKO:0000052    0.121422
15                   PhysicalExercise.code  GECKO:0000052    0.375404
16              ProfessionalSportPast.code  GECKO:0000052    0.375404
17                  ProfessionalSport.code  GECKO:0000052    0.375404
18    SpareTimeActivities.childcarePerWeek  GECKO:0000052    0.155255
19  SpareTimeActivities.elderlyCarePerWeek  GECKO:0000052    0.133835
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv  -s build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv  -s build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_clean.tsv', 'build/intermediate/cogs_mapping_suggestions_zooma_definition.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp_definition.tsv']
Mapping suggestions files concat:
                          term  ... confidence
0                       Person  ...   0.760000
1                       Person  ...   0.760000
2                       Person  ...   0.760000
3                       Sample  ...   0.980000
4                  Nationality  ...   1.000000
5                    Education  ...   1.000000
6                      Smoking  ...   1.000000
7                      Alcohol  ...   1.000000
8                        Sleep  ...   0.980000
9                       Health  ...   0.980000
0  ObjectiveInformation.weight  ...   0.837803
1     ObjectiveInformation.bmi  ...   0.672504
2   ObjectiveInformation.waist  ...   0.267665
3     ObjectiveInformation.hip  ...   0.272348
4  ObjectiveInformation.height  ...   0.689633
5                 Sample.vkood  ...   0.199489
6           Sample.visitNumber  ...   0.199489
7        PhysicalExercise.code  ...   0.404730
8   ProfessionalSportPast.code  ...   0.404730
9       ProfessionalSport.code  ...   0.404730

[20 rows x 4 columns]
Merging suggestions successful. First twenty results:
       Term ID  ...                                            Comment
0   EB:0000001  ...                                                NaN
1   EB:0000002  ...                                                NaN
2   EB:0000003  ...                                                NaN
3   EB:0000004  ...                                                NaN
4   EB:0000005  ...                                                NaN
5   EB:0000006  ...                                                NaN
6   EB:0000007  ...                                                NaN
7   EB:0000008  ...                                                NaN
8   EB:0000009  ...                                                NaN
9   EB:0000010  ...                                                NaN
10  EB:0000011  ...  Person's last measurements, smoking status and...
11  EB:0000012  ...  Person's last measurements, smoking status and...
12  EB:0000013  ...  Person's last measurements, smoking status and...
13  EB:0000014  ...  Person's last measurements, smoking status and...
14  EB:0000015  ...  Person's last measurements, smoking status and...
15  EB:0000016  ...  Person's last measurements, smoking status and...
16  EB:0000017  ...  Person's last measurements, smoking status and...
17  EB:0000018  ...  Person's last measurements, smoking status and...
18  EB:0000019  ...  Person's last measurements, smoking status and...
19  EB:0000020  ...  Person's last measurements, smoking status and...

[20 rows x 8 columns]
make[1]: Leaving directory '/workspace'
cp build/suggestions_cogs.tsv build/terminology.tsv
rm -f templates/cogs.tsv
make cogs-apply-data-validation
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/create-data-validation.py build/terminology.tsv build/gecko_labels.tsv build/cogs-data-validation.tsv build/cogs-info-table.tsv
cogs apply build/cogs-data-validation.tsv
make[1]: Leaving directory '/workspace'
make cogs-apply-info-table
make[1]: Entering directory '/workspace'
cogs apply build/cogs-info-table.tsv
make[1]: Leaving directory '/workspace'
cogs push