ResJobFit - end-to-end artificial neural networks based technology for job-resume matching
DOI:
https://doi.org/10.15276/aait.07.2024.27Keywords:
Artificial neural networks, IT systems, machine learning, NLP, transformers, text embedding, information retrievalAbstract
With the ever-growing expansion of online recruitment, a reliable person-job matching has become increasingly crucial. Due to
different experience, education and specialization requirements, as well as location considerations, specified in the job advertisement,
various aspects should be taken into account for reliable matching and ranking of candidates. It has been shown that matching of
resumes and vacancies can be approached as either pair classification or semantic similarity search based on embeddings. While
classification approaches process each vacancy-resume pair sequentially, thus resulting in quadratic time complexity, independent
text embeddings and ranking is a much more efficient and scalable solution, since it has linear time complexity. In this article
semantic similarity search to rank suitability of candidates with regards to vacancies has been used. ResJobFit - an end-to-end
Artificial Neural Networks based technology for job-resume matching is proposed. ResJobFit technology consists of Segmentation,
Parsing, Summarization and HR Embedding Module models, and their outputs (vector and attributes defining each resume or job
advertisement), as well as a Vector Database in which the records are stored. Unsupervised text embeddings training for HR domain
encapsulating two novel training objectives - intra- and cross-section contrastive alignment is introduced. Pretrained BERT-base
model is adapted by teaching it to match summary-last employment sections of the resume with parts of the same vacancy or
employment section. As baselines TFIDF, BERT, E5 and GTE have been used. The proposed unsupervised training strategy was
compared against SimCSE, DeCLUTR and ConFit approaches. NDCG, MAP and MRR are used as metrics for measuring accuracy
of the designed algorithm. It has been shown that the novel training objective lets it achieve significant improvement in comparison
to other unsupervised training approaches. Improvement of 11% in NDCG was achieved by adapting the DeCLUTR training strategy
for the HR domain based on exploiting the structure of resumes over the classical DeCLUTR training strategy on the task of ranking
summaries of vacancies and resumes generated by large language models. 2% and 6% have been achieved using ResJobFit and
ResJobFit with requirements matching over state-of-the-art ConFit model on the task of ranking full-text vacancies and resumes.