Evaluating the Effectiveness of the LexRank and LSA Algorithm in Automatic Text Summarization for Indonesian Language

Galih Wiratmoko

doi:10.46799/jsa.v5i9.1483

Evaluating the Effectiveness of the LexRank and LSA Algorithm in Automatic Text Summarization for Indonesian Language

Galih Wiratmoko

Universitas Muhamadiyah Surakarta, Indonesia

The aim of this study is to evaluate how effective the Lexrank algorithm and Latent semantic analysis (LSA) are in automatic text summarization for the Indonesian language. This research focuses on natural language processing and handling of excessive data. We applied both algorithms to generate text summaries using the INDOSUM dataset, which contains about 20,000 news articles in Indonesian with manual summaries. To assess performance, the ROUGE metric was used, which includes aspects of precision, recall, and F1 score. In all tested metrics, LSA outperformed Lexrank. LSA had a precision of 0.57, recall of 0.67, and an F1 score of 0.59, whereas Lexrank had a precision of 0.46, recall of 0.52, and an F1 score of 0.48. These result indicate that LSA is better at gathering important information from the original text than Lexrank.

Keywords: Automatic Text Summarization, Latent Sematic Analysis, Lexrank

Ay, Betul, Ertam, Fatih, Fidan, Guven, & Aydin, Galip. (2023). Turkish abstractive text document summarization using text to text transfer transformer. Alexandria Engineering Journal, 68, 1–13. https://doi.org/10.1016/j.aej.2023.01.008.
Bhuyan, Swagat Shubham, Mahanta, Saranga Kingkor, Pakray, Partha, & Favre, Benoit. (2023). Textual entailment as an evaluation metric for abstractive text summarization. Natural Language Processing Journal, 4, 100028. https://doi.org/10.1016/j.nlp.2023.100028.
Dhivyaa, C. R., Nithya, K., Janani, T., Kumar, K. Sathis, & Prashanth, N. (2022). Transliteration based generative pre-trained transformer 2 model for Tamil text summarization. 2022 International Conference on Computer Communication and Informatics (ICCCI), 1–6. https://doi.org/10.1109/ICCCI54379.2022.9740991
Fan, Junqing, Tian, Xiaorong, Lv, Chengyao, Zhang, Simin, Wang, Yuewei, & Zhang, Junfeng. (2023). Extractive social media text summarization based on MFMMR-BertSum. Array, 20, 100322. https://doi.org/10.1016/j.array.2023.100322.
Gunawan, Fergyanto E., Juandi, Adrian Victor, & Soewito, Benfano. (2015). An automatic text summarization using text features and singular value decomposition for popular articles in Indonesia language. 2015 International Seminar on Intelligent Technology and Its Applications (ISITIA), 27–32. https://doi.org/10.1109/ISITIA.2015.7219948.
A. N. Enhanced, L. A. For, and U. Summarization, “An enhanced lsa-based approach for update summarization,” pp. 493–497.
Hernández-Castañeda, Ángel, García-Hernández, René Arnulfo, Ledeneva, Yulia, & Millán-Hernández, Christian Eduardo. (2020). Extractive automatic text summarization based on lexical-semantic keywords. IEEE Access, 8, 49896–49907.
J. N. Madhuri, “Extractive Text Summarization Using Sentence Ranking,” 2019 Int. Conf. Data Sci. Commun., pp. 1–3, 2019.
Khan, Bilal, Shah, Zohaib Ali, Usman, Muhammad, Khan, Inayat, & Niazi, Badam. (2023). Exploring the landscape of automatic text summarization: a comprehensive survey. IEEE Access. https://doi.org/10.1109/ACCESS.2023.3322188
Kurniawan, Kemal, & Louvan, Samuel. (2018). Indosum: A new benchmark dataset for indonesian text summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/10.1109/IALP.2018.8629109.
Mridha, Muhammad Firoz, Lima, Aklima Akter, Nur, Kamruddin, Das, Sujoy Chandra, Hasan, Mahmud, & Kabir, Muhammad Mohsin. (2021). A survey of automatic text summarization: Progress, process and challenges. IEEE Access, 9, 156043–156070. https://doi.org/10.1109/ACCESS.2021.3129786.
Shah, Prachi, & Desai, Nikita P. (2016). A survey of automatic text summarization techniques for Indian and foreign languages. 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 4598–4601. https://doi.org/10.1109/ICEEOT.2016.7755587.
Y. Kumar, K. Kaur, and S. Kaur, Study of automatic text summarization approaches in different languages, vol. 54, no. 8. Springer Netherlands, 2021. doi: 10.1007/s10462-021-09964-4.
Wahab, Muhammad Hafizul H., Ali, Nor Hafiza, Hamid, Nor Asilah Wati A., Subramaniam, Shamala K., Latip, Rohaya, & Othman, Mohamed. (2023). A Review on Optimization-Based Automatic Text Summarization Approach. IEEE Access. https://doi.org/10.1109/ACCESS.2023.3348075.
Widyassari, Adhika Pramita, Affandy, Affandy, Noersasongko, Edy, Fanani, Ahmad Zainul, Syukur, Abdul, & Basuki, Ruri Suko. (2019). Literature review of automatic text summarization: research trend, dataset and method. 2019 International Conference on Information and Communications Technology (ICOIACT), 491–496. https://doi.org/10.1109/ICOIACT46704.2019.8938454.
W. S. El-kassas, C. Salama, A. Rafea, and H. K. Mohamed, “Automatic Text Summarization : A Comprehensive Survey,” no. July, 2020, doi: 10.1016/j.eswa.2020.113679.
Wu, Kang, Shi, Ping, & Pan, Da. (2015). An approach to automatic summarization for chinese text based on the combination of spectral clustering and LexRank. 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 1350–1354. https://doi.org/10.1109/FSKD.2015.7382140