Skip to main content

Table 3 Summary of challenges and related solutions

From: Infodemic: Challenges and solutions in topic discovery and data process

Challenges

Solutions

methods

Dynamic data.

Collect data samples quickly.

Use different data collection methods or develop models, such as digital tracking and survey data, LITMUS and so on.

Small data volume.

Obtain the required data samples.

Choose a dataset that is publicly available on the platform itself, or use computer tools to capture representative data.

Data complexity.

Filter data.

Use tools such as TF-IDF and BOW to obtain word frequencies and delete redundant information.

Overfitting during experiments.

Enhance data.

Use EDA, ENDA, AEDA, and other methods to generate new data or learn amplification strategies.

Misclassification.

Choose the right classifier.

Choose different classifiers according to different features of the dataset, such as NB if the dataset is small, SGD if high sensitivity values are needed, LR if the category is clear, Inception-v3 if it is image processing, ResNet if it is a computer vision task, and so on, and optimize the existing models.

Topic drift and diversity.

Use appropriate identification methods

Find the source of topic drift, choose a suitable method for recognizing topic drift, or use a suitable topic extraction model.