International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)
To automate the retrieval of online job opportunities of a specific domain, text classification is an only viable method. In this paper we investigated eight classifiers and defined a new preprocessing method for job offers classification, and study their accuracy and generalization behavior. Different job offer websites are used to collect job offers data. We organized data into six different dataset groups by applying different existing and a new preprocessing method. Classifiers are regularized to avoid high variance and their accuracy and generalization errors evaluated. All the classifiers showed >90% accuracy but generalization error varied. Ridge Regression and Stochastic Gradient Decent generalized well on new data for all the groups, on the contrary Random Forest and Perceptron tenacious toward high variance. Remaining classifiers exhibited both behaviors for different dataset groups. We found two classifiers that generalized well on new data. Our finding highlight the potential use of classifiers to automate the job offers retrieval from the World Wide Web.
Information and Communication Technology