Building a corpus-based academic vocabulary list of four languages


  • Mohammad Ahsanuddin Universitas Negeri Malang
  • Yusuf Hanafi
  • Yazid Basthomi
  • Febri Taufiqurrahman
  • Herri Akhmad Bukhori
  • Joko Samodra
  • Utami Widiati
  • Primardiana Hermilia Wijayati



Corpus linguistic, vocabulary, language learning, Data-Driven Learning


This study aims to establish and explore students' perception of a corpus in vocabulary learning. The corpus development was completed based on IDM ADDIE. This research was started by conducting a problem analysis that reveals students' obstacles in learning a language. The students' are identified to have a limited vocabulary of the language they learned. The corpus construction and development was begun by creating a script in PHP language. This research produces a corpus with 377880 tokens and five sub-corpora, namely Indonesian, English, German, Arabic, as well as art and design. The vocabularies are presented according to the highest frequency in the language and language teacher education field. The evaluation carried out by the experts of materials, language, and media discovers that the corpus is feasible to be integrated into the learning. Simultaneously, the assessment from students who have attended the corpus' implementation with data-driven learning (DDL) approach shows that this corpus helps students broaden their vocabulary, including the word meaning, form, and usage through observation on the concordance and collocation lines.


Download data is not yet available.


Achmad, S. (2013). Developing English Vocabulary Mastery through Meaningful Learning Approach. International Journal of Linguistics, 5(5), 75–97.

Ahsanuddin, M., Ma’sum, A., & Ridwan, N. A. (2020). Investigating Arabic Corpus (Korsa) of Indonesian Undergraduate Thesis Abstracts. Humanities & Social Sciences Reviews, 8(3), 920–927.

Akbar, G. (2013). Metode Pembelajaran Al-Qur’an melalui Media Online. Indonesian Jurnal on Networking and Security (IJNS), 2(1), 65–68.

Al-Kufaishi, A. (1988). Iraq: A Vocabulary-Building Program is a Necessity Not a Luxury. English Teaching Forum, 26(2), 42.

Allan, R. (2009). Can a graded reader corpus provide ‘authentic’ input? ELT Journal, 63(1), 23–32.

Aşık, A. (2017). A sample implementation of corpus integration through coursebook evaluation: Implications for language teacher education. Journal of Language and Linguistic Studies, 13(2), 728–740.

Aziez, F., Aziez, F., & Purwanto, B. E. (2020). Receptive Vocabulary Knowledge and Reading Skills of Indonesian Prospective EFL Teachers. Universal Journal of Educational Research, 8(5), 2005–2011.

Basanta, C. P., & Martin, M. E. R. (2005). The application of data-driven learning to a small-scale corpus of conversational texts from the BNC–British National Corpus. International Journal of Learning, 12(8), 183–192.

Bengtsson, M. (2016). How to plan and perform a qualitative study using content analysis. NursingPlus Open, 2, 8–14.

Branch, R. M. (2009). Instructional Design: The ADDIE Approach (2010 edition). Springer.

Carter, R., & McCarthy, M. (2006). Cambridge Grammar of English: A Comprehensive Guide (1st edition). Cambridge University Press.

Chambers, A. (2005). Integrating corpus consultation in language studies. Language Learning & Technology, 9(2), 111–125.

Chen, Y. (2011). Dictionary Use and Vocabulary Learning in the Context of Reading. International Journal of Lexicography, 25, 216–247.

Corino, E., & Onesti, C. (2019). Data-Driven Learning: A Scaffolding Methodology for CLIL and LSP Teaching and Learning. Frontiers in Education, 4.

Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34(2), 213–238.

Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25(4), 447–464.

Deni, R., & Fahriany, F. (2020). Teachers’ Perspective on Strategy for Teaching English Vocabulary to Young Learners. Vision: Journal for Language and Foreign Language Learning, 9(1), 47.

Enayati, F., & Gilakjani, A. P. (2020). The Impact of Computer Assisted Language Learning (CALL) on Improving Intermediate EFL Learners’ Vocabulary Learning. International Journal of Language Education, 4(2), 96–112.

Farr, F. (2008). Evaluating the Use of Corpus-based Instruction in a Language Teacher Education Context: Perspectives from the Users. Language Awareness, 17(1), 25–43.

Fuster Márquez, M., & Clavel Arroitia, B. (2010). Corpus linguistics and its aplications in higher education. Revista Alicantina de Estudios Ingleses, 23, 51.

Gabrielatos, C. (2005). Corpora and language teaching: Just a fling or wedding bells? TESL-EJ, 8(4), 1–37.

Gardner, D., & Davies, M. (2014). A New Academic Vocabulary List. Applied Linguistics, 35(3), 305–327.

Gavioli, L., & Aston, G. (2001). Enriching reality: Language corpora in language pedagogy. ELT Journal, 55(3), 238–246.

Girgin, U. (2019). The Effectiveness of Using Corpus-Based Activities on the Learning of Some Phrasal-Prepositional Verbs. Turkish Online Journal of Educational Technology - TOJET, 18(1), 118–125.

Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238.

Gustafson, K. L., & Branch, R. B. (2002). What is instructional design? Dalam R. Reiser & J. Dempsey (Ed.), Trends and Issues in Instructional Design and Technology (hlm. 16–25). Merrill Prentice Hall.

Hanafi, Y., Murtadho, N., Ikhsan, M. A., & Diyana, T. N. (2020). Reinforcing Public University Student’s Worship Education by Developing and Implementing Mobile-Learning Management System in the ADDIE Instructional Design Model. International Journal of Interactive Mobile Technologies (iJIM), 14(02), 215.

Heather, J., & Helt, M. (2012). Evaluating Corpus Literacy Training for Pre-Service Language Teachers: Six Case Studies. Journal of Technology and Teacher Education, 20(4), 415–440.

Hiebert, E. H., & Kamil, M. L. (Ed.). (2005). Teaching and learning vocabulary: Bringing research to practice. L. Erlbaum Associates.

Hinkel, E. (2011). What Research on Second Language Writing Tells Us and What it Doesn’t. Handbook of Research in Second Language Teaching and Learning.

Hsueh-Chao, M. H., & Nation, P. (2000). Unknown Vocabulary Density and Reading Comprehension. Reading in a Foreign Language, 13(1), 403–430.

Hu, M., & Nation, P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13, 403–430.

Johns, T. (1991). From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. English Language Research Journal, 4, 27–45.

Kilgarriff, A., Charalabopoulou, F., Gavrilidou, M., Johannessen, J. B., Khalil, S., Johansson Kokkinakis, S., Lew, R., Sharoff, S., Vadlapudi, R., & Volodina, E. (2014). Corpus-based vocabulary lists for language learners for nine languages. Language Resources and Evaluation, 48(1), 121–163.

Kilic, M. (2019). Vocabulary Knowledge as a Predictor of Performance in Writing and Speaking: A Case of Turkish EFL Learners. PASAA, 57, 133–164.

Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels (1st ed). Berrett-Koehler ; Publishers Group West [distributor].

Koosha, M., & Jafarpour, A. (2006). Data-driven learning and teaching collocation of prepositions: The case of Iranian EFL adult learners. Asian EFL Journal, 8, 192–209.

Krieger, D. (2003). Krieger—Corpus Linguistics: What It Is and How It Can Be Applied to Teaching. The Internet TESL Journal, 9(3).

Laufer, B., & Nation, P. (1995). Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics, 16(3), 307–322.

Lenko-Szymanska, A. (2014). Is This Enough? A Qualitative Evaluation of the Effectiveness of a Teacher-Training Course on the Use of Corpora in Language Education. ReCALL, 26.

Liu, J. T. (2010). Analysis on the Status Quo and Countermeasures of Art Majors’ English Studying. Hundred Schools in Arts, 117(8), 429–432.

Ma, Q., & Kelly, P. (2006). Computer Assisted Vocabulary Learning: Design and evaluation. Computer Assisted Language Learning, 19.

Mouri, S., & Rahimi, A. (2016). Computer-assisted language learning and EFL students’ vocabulary learning. Global Journal of Foreign Language Teaching ( GJFLT), 06, 210–217.

Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.

Nation, I. S. P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. Dalam N. Schmitt & M. McCarthy (Ed.), Vocabulary: Description, acquisition and pedagogy (hlm. 6–19). Cambridge University Press.

Nouri, N., & Zerhouni, B. (2016). The relationship between vocabulary knowledge and reading comprehension among Moroccan EFL learners. IOSR International Journal of Humanities and Social Science (IOSR-JHSS), Volume 21, PP 19-26.

Novianti, R. (2016). A study of Indonesian university students’ vocabulary mastery with vocabulary level test. Global Journal of Foreign Language Teaching, 6, 187.

O’keeffe, A., & Farr, F. (2003). Using Language Corpora in Initial Teacher Education: Pedagogic Issues and Practical Applications. TESOL Quarterly, 37(3), 389–418.

O’Keeffe, A., & McCarthy, M. (Ed.). (2012). The Routledge handbook of corpus linguistics. Routledge.

Paker, T., & Özcan, Y. E. (2017). The effectiveness of using corpus-based materials in vocabulary teaching. International Journal of Language Academy, 5(14), 62–81.

Peterson, C. (2003). Bringing ADDIE to Life: Instructional Design at Its Best. Jl. of Educational Multimedia and Hypermedia, 12(3), 227–241.

Rafatbakhsh, E., & Ahmadi, A. (2019). A thematic corpus-based study of idioms in the Corpus of Contemporary American English. Asian-Pacific Journal of Second and Foreign Language Education, 4(1), 11.

Reppen, R. (2012). Using corpora in the language classroom. Dalam B. Tomlinson (Ed.), Materials development for language learning and teaching (Vol. 45, hlm. 143–179). Cambridge University Press.

Schmitt, N., Jiang, X., & Grabe, W. (2011). The Percentage of Words Known in a Text and Reading Comprehension. Modern Language Journal, 95(1), 26–43.

Sidek, H. M., & Rahim, H. Ab. (2015). The Role of Vocabulary Knowledge in Reading Comprehension: A Cross-Linguistic Study. Procedia - Social and Behavioral Sciences, 197, 50–56.

Strickland, Galda, & Cullinan. (2003). Language Arts. Wadsworth Pub Co.

Sudarman, S., & Chinokul, S. (2018). The english vocabulary size and level of english department students at kutai kartanegara university. ETERNAL (English, Teaching, Learning, and Research Journal), 4(1), 1–15.

Tovar Viera, R. (2017). Vocabulary knowledge in the production of written texts: A case study on EFL language learners. Revista Tecnológica ESPOL, 30(3), 89–105.

Trust, T., & Pektas, E. (2018). Using the ADDIE Model and Universal Design for Learning Principles to Develop an Open Online Course for Teacher Professional Development. Journal of Digital Learning in Teacher Education, 34(4), 219–233.

Wang, P. (2017). A Corpus-based Study of English Vocabulary in Art Research Articles. Journal of Arts and Humanities, 6(8), 47.

Youngblood, A. M., & Folse, K. S. (2017). Survey of Corpus-Based Vocabulary Lists for TESOL Classes. MEXTESOL Journal, 41(1), 1–15.




How to Cite

Ahsanuddin, M., Hanafi, Y. ., Basthomi, Y. ., Taufiqurrahman, F. ., Bukhori, H. A. ., Samodra, J. ., Widiati, U. ., & Wijayati, P. H. . (2022). Building a corpus-based academic vocabulary list of four languages. Pegem Journal of Education and Instruction, 12(1), 159–167.