Comparison of Data Mining Techniques in the Cloud for Software Engineering

Mining software engineering data has recently become an important research topic to meet the goal of improving the software engineering processes, software productivity, and quality. On the other hand, mining software engineering data poses several challenges such as high computational cost, hardware limitations, and data management issues (i.e., the availability, reliability, and security of data). To address these problems, this chapter proposes the application of data mining techniques in cloud, the environment on software engineering data, due to cloud computing benefits such as increased computing speed, scalability, flexibility, availability, and cost efficiency. It compares the performances of five classification algorithms (decision forest, neural network, support vector machine, logistic regression, and Bayes point machine) in the cloud in terms of both accuracy and runtime efficiency. It presents experimental studies conducted on five different real-world software engineering data related to the various software engineering tasks, including software defect prediction, software quality evaluation, vulnerability analysis, issue lifetime estimation, and code readability prediction. Experimental results show that the cloud is a powerful platform to build data mining applications for software engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

eBook EUR 117.69 Price includes VAT (France)

Softcover Book EUR 158.24 Price includes VAT (France)

Hardcover Book EUR 158.24 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Software Fault Diagnosis via Intelligent Data Mining Algorithms

Chapter © 2023

Data Mining for Software Engineering: A Survey

Chapter © 2022

Software Quality Prediction Using Machine Learning

Chapter © 2022

References

  1. Sarkar A, Bhattacharya A, Dutta S, Parikh KK (2019) Recent trends of data mining in cloud computing, proc emerging technologies in data mining and information security (IEMIS 2018). Adv Intell Syst Comput 813:565–578 Google Scholar
  2. Chen J, Li K, Rong H, Bilal K, Yang N, Li K (2018) A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inf Sci 435:124–149 ArticleGoogle Scholar
  3. Dahmani D, Rahal SA, Belalem G (2016) Improving the performance of data mining by using big data in cloud environment. J Inf Knowl Manage 15(4):2016 Google Scholar
  4. Rajarajeswari P, Pradeep Kumar J, Vasumathi D (2018) Design and implementation of weather fore casting system based on cloud computing and data mining techniques. Int J Eng Technol 7:219–224 ArticleGoogle Scholar
  5. Xu H, Fan G (2019) Application of big data mining technology in intelligent safe production on cloud computing platform. Adv Intell Syst Comput 842:1255–1262 Google Scholar
  6. Marozzo F, Talia D, Trunfio P (2018) A workflow management system for scalable data mining on clouds. IEEE Trans Serv Comput 11(3) ArticleGoogle Scholar
  7. Zhou G (2015) Cloud platform based on mobile internet service opportunistic drive and application aware data mining. J Electr Comput Eng, Article no 357378, 21 Jan 2015 Google Scholar
  8. Minku LL, Mendes E, Turhan B (2016) Data mining for software engineering and humans in the loop. Progr Artif Intell 5(4):307–314 ArticleGoogle Scholar
  9. Massey AK, Eisenstein J, Anton AI, Swire PP (2013) Automated text mining for requirements analysis of policy documents. In: Proceedings of 21st IEEE international requirements engineering conference (RE 2013), Rio de Janeiro, Brazil, pp 4–13, 15–19 July 2013 Google Scholar
  10. Ghasemi M (2018), What requirements engineering can learn from process mining. In: Proceedings of 1st international workshop on learning from other disciplines for requirements engineering (D4RE 2018), Banff, Canada, Article number 8595126, pp 8–11, 20 Aug 2018 Google Scholar
  11. Dwivedi AK, Tirkey A, Rath SK (2018) Software design pattern mining using classification-based techniques. Front Comput Sci 12(5):908–922 ArticleGoogle Scholar
  12. Hamdy A, Elsayed M (2018) Towards more accurate automatic recommendation of software design patterns. J Theor Appl Inf Technol 96(15):5069–5079, 15 Aug 2018 Google Scholar
  13. Casamayor A, Godoy D, Campo M (2012) Mining textual requirements to assist architectural software design: a state of the art review. Artif Intell Rev 38(3):173–191 ArticleGoogle Scholar
  14. Czibula G, Marian Z, Czibula IG (2015) Detecting software design defects using relational association rule mining. Knowl Inf Syst 42(3): 545–577 ArticleGoogle Scholar
  15. Gilda S (2017) Source code classification using neural networks. In: Proceedings of the 14th international joint conference on computer science and software engineering (JCSSE), Thailand, 12–14 July 2017 Google Scholar
  16. Zheng W, Zhou H, Li M, Wu J (2019) CodeAttention: translating source code to comments by exploiting the code constructs. Front Comput Sci 1–14 Google Scholar
  17. Niu H, Keivanloo I, Zou Y (2017) API usage pattern recommendation for software development. J Syst Softw 129:127–139 ArticleGoogle Scholar
  18. Calefato F, Lanubile F, Maiorano F, Novielli N (2018) Sentiment polarity detection for software development. Empir Softw Eng 23(3):1352–1382 ArticleGoogle Scholar
  19. Siers MJ, Islam MZ (2018) Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects. Inf Sci 459:53–70 ArticleGoogle Scholar
  20. Zhu X, Niu B, Whitehead EJ, Sun Z (2018) An empirical study of software change classification with imbalance data-handling methods. Softw Pract Exp 48(11):1968–1999 Google Scholar
  21. Ghaffarian SM, Shahriari HR (2017) Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput Surv 50(4):1–36, , Article number 56 ArticleGoogle Scholar
  22. Liu W, Liu H (2016) Major motivations for extract method refactorings: analysis based on interviews and change histories. Front Comput Sci 10(4):644–656 ArticleGoogle Scholar
  23. Mi Q, Keung J, Xiao Y, Mensah S, Gao Y (2018) Improving code readability classification using convolutional neural networks. Inf Softw Technol 104:60–71 ArticleGoogle Scholar
  24. Bishnu PS, Bhattacherjee V (2016) Software cost estimation based on modified K-Modes clustering Algorithm. Nat Comput 15(3):415–422 ArticleMathSciNetMATHGoogle Scholar
  25. Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38(2):375–397 ArticleGoogle Scholar
  26. Zavvar M, Yavari A, Mirhassannia SM, Nehi MR, Yanpi A, Zavvar MH (2017) Classification of risk in software development projects using support vector machine. J Telecommun Electron Comput Eng 9(1):1–5 Google Scholar
  27. Gilal AR, Jaafar J, Capretz LF, Omar M, Basri S, Aziz IA (2018) Finding an effective classification technique to develop a software team composition model. J Softw Evolut Process 30(1):1–12 Google Scholar
  28. Shawky DM, Abd-El-Hafiz SK (2016) Characterizing software development method using metrics. J Softw Evolut Process 28(2):82–96 Google Scholar
  29. Mauša G, Galinac Grbac T (2017) Co-evolutionary multi-population genetic programming for classification in software defect prediction: an empirical case study. Appl Soft Comput 55:331–351 ArticleGoogle Scholar
  30. Iwata K, Nakashima T, Anan Y, Ishii N (2017) Machine learning classification to effort estimation for embedded software development projects. Int J Softw Innov 5(4):19–32 ArticleGoogle Scholar
  31. Werner CM, Berry DM (2017) An empirical study of the software development process, including its requirements engineering, at very large organization: how to use data mining in such a study. In: Proceedings of 4th symposium on Asia-Pacific requirements engineering symposium (APRES 2017), Melaka, Malaysia, 9–10 Nov 2017. Communications in Computer and Information Science, vol 809, pp 15–25 Google Scholar
  32. Pandey N, Sanyal DK, Hudait A, Sen A (2017) Automated classification of software issue reports using machine learning techniques: an empirical study. Innov Syst Softw Eng 13(4):279–297 ArticleGoogle Scholar
  33. Chaudhary P, Singh D, Sharma A (2016) Classification of software project risk factors using machine learning approach. In: Intelligent systems technologies and applications, pp 297–309 Google Scholar
  34. Scalabrino S, Linares-Vásquez M, Oliveto R, Poshyvanyk D (2018) A comprehensive model for code readability. J Softw Evolut Process 30(6):1–23 Google Scholar
  35. Hussain S, Keung J, Sohail MK, Khan AA, Ilahi M (2019) Automated framework for classification and selection of software design patterns. Appl Soft Comput 75:1–20 ArticleGoogle Scholar
  36. Ackermann C, Cleaveland R, Huang S, Ray A, Shelton C, Latronico E (2010) Automatic requirement extraction from test cases. In: Barringer H et al (eds) Runtime verification (RV 2010), vol 6418. Lecture notes in computer science. Springer, Berlin, pp 1–15 ChapterGoogle Scholar
  37. Sartipi K, Safyallah H (2010) Dynamic knowledge extraction from software systems using sequential pattern mining. Int J Softw Eng Knowl Eng 20(6):761–782 ArticleGoogle Scholar
  38. Lu H, Wang L, Ye M, Yan K, Jin Q (2018) DNN-based image classification for software GUI testing. In: Proceedings of IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation, 8–12 Oct 2018, Guangzhou, China, pp 1818–1823 Google Scholar
  39. Jiang Y, Huang J, Ding J, Liu Y (2014) Method of fault detection in cloud computing systems. Int J Grid Distrib Comput 7(3):205–212 ArticleGoogle Scholar
  40. Okumoto K, Asthana A, Mijumbi R (2017) BRACE: cloud-based software reliability assurance. In: Proceedings of IEEE 28th international symposium on software reliability engineering workshops, Toulouse, France, 23–26 Oct 2017 Google Scholar
  41. Ali MM, Huda S, Abawajy J, Alyahya S, Al-Dossari H, Yearwood J (2017) A parallel framework for software defect detection and metric selection on cloud computing. Cluster Comput 20(3):2267–2281 ArticleGoogle Scholar
  42. Baitharu TR, Pani SK (2013) A survey on application of machine learning algorithms on data mining. Int J Innov Technol Explor Eng 3(7), December 2013 Google Scholar
  43. Halkidi M, Spinellis D, Tsatsaronis G, Vazirgiannis M (2011) Data mining in software engineering. Intell Data Anal 15:413–441 ArticleGoogle Scholar
  44. Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142 Google Scholar
  45. Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518 ArticleGoogle Scholar
  46. Azeem MI, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol 108:115–138 ArticleGoogle Scholar
  47. Koru G, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of 3rd international workshop on predictor models in software engineering (PROMISE’07: ICSE Workshops 2007), Minneapolis, MN, USA, 20–26 May 2007 Google Scholar
  48. Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: software metrics vs text mining. In: Proceedings of IEEE 25th international symposium on software reliability engineering (ISSRE), Naples, Italy, pp 23–33, 3–6 Nov 2014 Google Scholar
  49. Rees-Jones M, Martin M, Menzies T (2017) Better predictors for issue lifetime CoRR abs/170207735 Google Scholar
  50. Dorn J (2012) A general software readability model Master’s Thesis. University of Virginia, Department of Computer Science. Accessed 12 Apr 2019. http://www.cs.virginia.edu/%jad5ju/publications/dorn-mcs-paper.pdf

Author information

Authors and Affiliations

  1. Department of Computer Engineering, Dokuz Eylul University, Izmir, Turkey Kokten Ulas Birant & Derya Birant
  1. Kokten Ulas Birant