Accurate occupation classification is essential in various fields, including policy development and epidemiological studies. This study aims to develop an occupation classification model based on DistilKoBERT.
This study used data from the 5th and 6th Korean Working Conditions Surveys conducted in 2017 and 2020, respectively. A total of 99,665 survey participants, who were nationally representative of Korean workers, were included. We used natural language responses regarding their job responsibilities and occupational codes based on the Korean Standard Classification of Occupations (7th version, 3-digit codes). The dataset was randomly split into training and test datasets in a ratio of 7:3. The occupation classification model based on DistilKoBERT was fine-tuned using the training dataset, and the model was evaluated using the test dataset. The accuracy, precision, recall, and F1 score were calculated as evaluation metrics.
The final model, which classified 28,996 survey participants in the test dataset into 142 occupational codes, exhibited an accuracy of 84.44%. For the evaluation metrics, the precision, recall, and F1 score of the model, calculated by weighting based on the sample size, were 0.83, 0.84, and 0.83, respectively. The model demonstrated high precision in the classification of service and sales workers yet exhibited low precision in the classification of managers. In addition, it displayed high precision in classifying occupations prominently represented in the training dataset.
This study developed an occupation classification system based on DistilKoBERT, which demonstrated reasonable performance. Despite further efforts to enhance the classification accuracy, this automated occupation classification model holds promise for advancing epidemiological studies in the fields of occupational safety and health.
Some epidemiological studies have estimated exposure among flight attendants with and without breast cancer. However, it is difficult to find a quantitative evaluation of occupational exposure factors related to cancer development individually in the case of breast cancer in flight attendants. That is, most, if not all, epidemiological studies of breast cancer in flight attendants with quantitative exposure estimates have estimated exposure in the absence of individual flight history data.
A 41-year-old woman visited the hospital due to a left breast mass after a regular check-up. Breast cancer was suspected on ultrasonography. Following core biopsy, she underwent various imaging modalities. She was diagnosed invasive ductal carcinoma of no special type (estrogen receptor positive in 90%, progesterone receptor positive in 3%, human epidermal growth factor receptor 2/neu equivocal) with histologic grade 3 and nuclear grade 3 in the left breast. Neoadjuvant chemotherapy was administered to reduce the tumor size before surgery. However, due to serious chemotherapy side effects, the patient opted for alternative and integrative therapies. She joined the airline in January, 1996. Out of all flights, international flights and night flights accounted for 94.9% and 26.2, respectively. Night flights were conducted at least four times per month. Moreover, based on the virtual computer program CARI-6M, the estimated dose of cosmic radiation exposure was 78.81 mSv. There were no other personal triggers or family history of breast cancer.
This case report shows that the potentially causal relationship between occupational harmful factors and the incidence of breast cancer may become more pronounced when night shift workers who work continuously are exposed to cosmic ionizing radiation. Therefore, close attention and efforts are needed to adjust night shift work schedules and regulate cosmic ionizing radiation exposure.
Citations