Comparison of Data Mining Techniques in the Cloud for Software Engineering
Mining software engineering data has recently become an important research topic to meet the goal of improving the software engineering processes, software productivity, and quality. On the other hand, mining software engineering data poses several challenges such as high computational cost, hardware limitations, and data management issues (i.e., the availability, reliability, and security of data). To address these problems, this chapter proposes the application of data mining techniques in cloud, the environment on software engineering data, due to cloud computing benefits such as increased computing speed, scalability, flexibility, availability, and cost efficiency. It compares the performances of five classification algorithms (decision forest, neural network, support vector machine, logistic regression, and Bayes point machine) in the cloud in terms of both accuracy and runtime efficiency. It presents experimental studies conducted on five different real-world software engineering data related to the various software engineering tasks, including software defect prediction, software quality evaluation, vulnerability analysis, issue lifetime estimation, and code readability prediction. Experimental results show that the cloud is a powerful platform to build data mining applications for software engineering.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
eBook EUR 117.69 Price includes VAT (France)
Softcover Book EUR 158.24 Price includes VAT (France)
Hardcover Book EUR 158.24 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Software Fault Diagnosis via Intelligent Data Mining Algorithms
Chapter © 2023
Data Mining for Software Engineering: A Survey
Chapter © 2022
Software Quality Prediction Using Machine Learning
Chapter © 2022
References
- Sarkar A, Bhattacharya A, Dutta S, Parikh KK (2019) Recent trends of data mining in cloud computing, proc emerging technologies in data mining and information security (IEMIS 2018). Adv Intell Syst Comput 813:565–578 Google Scholar
- Chen J, Li K, Rong H, Bilal K, Yang N, Li K (2018) A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inf Sci 435:124–149 ArticleGoogle Scholar
- Dahmani D, Rahal SA, Belalem G (2016) Improving the performance of data mining by using big data in cloud environment. J Inf Knowl Manage 15(4):2016 Google Scholar
- Rajarajeswari P, Pradeep Kumar J, Vasumathi D (2018) Design and implementation of weather fore casting system based on cloud computing and data mining techniques. Int J Eng Technol 7:219–224 ArticleGoogle Scholar
- Xu H, Fan G (2019) Application of big data mining technology in intelligent safe production on cloud computing platform. Adv Intell Syst Comput 842:1255–1262 Google Scholar
- Marozzo F, Talia D, Trunfio P (2018) A workflow management system for scalable data mining on clouds. IEEE Trans Serv Comput 11(3) ArticleGoogle Scholar
- Zhou G (2015) Cloud platform based on mobile internet service opportunistic drive and application aware data mining. J Electr Comput Eng, Article no 357378, 21 Jan 2015 Google Scholar
- Minku LL, Mendes E, Turhan B (2016) Data mining for software engineering and humans in the loop. Progr Artif Intell 5(4):307–314 ArticleGoogle Scholar
- Massey AK, Eisenstein J, Anton AI, Swire PP (2013) Automated text mining for requirements analysis of policy documents. In: Proceedings of 21st IEEE international requirements engineering conference (RE 2013), Rio de Janeiro, Brazil, pp 4–13, 15–19 July 2013 Google Scholar
- Ghasemi M (2018), What requirements engineering can learn from process mining. In: Proceedings of 1st international workshop on learning from other disciplines for requirements engineering (D4RE 2018), Banff, Canada, Article number 8595126, pp 8–11, 20 Aug 2018 Google Scholar
- Dwivedi AK, Tirkey A, Rath SK (2018) Software design pattern mining using classification-based techniques. Front Comput Sci 12(5):908–922 ArticleGoogle Scholar
- Hamdy A, Elsayed M (2018) Towards more accurate automatic recommendation of software design patterns. J Theor Appl Inf Technol 96(15):5069–5079, 15 Aug 2018 Google Scholar
- Casamayor A, Godoy D, Campo M (2012) Mining textual requirements to assist architectural software design: a state of the art review. Artif Intell Rev 38(3):173–191 ArticleGoogle Scholar
- Czibula G, Marian Z, Czibula IG (2015) Detecting software design defects using relational association rule mining. Knowl Inf Syst 42(3): 545–577 ArticleGoogle Scholar
- Gilda S (2017) Source code classification using neural networks. In: Proceedings of the 14th international joint conference on computer science and software engineering (JCSSE), Thailand, 12–14 July 2017 Google Scholar
- Zheng W, Zhou H, Li M, Wu J (2019) CodeAttention: translating source code to comments by exploiting the code constructs. Front Comput Sci 1–14 Google Scholar
- Niu H, Keivanloo I, Zou Y (2017) API usage pattern recommendation for software development. J Syst Softw 129:127–139 ArticleGoogle Scholar
- Calefato F, Lanubile F, Maiorano F, Novielli N (2018) Sentiment polarity detection for software development. Empir Softw Eng 23(3):1352–1382 ArticleGoogle Scholar
- Siers MJ, Islam MZ (2018) Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects. Inf Sci 459:53–70 ArticleGoogle Scholar
- Zhu X, Niu B, Whitehead EJ, Sun Z (2018) An empirical study of software change classification with imbalance data-handling methods. Softw Pract Exp 48(11):1968–1999 Google Scholar
- Ghaffarian SM, Shahriari HR (2017) Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput Surv 50(4):1–36, , Article number 56 ArticleGoogle Scholar
- Liu W, Liu H (2016) Major motivations for extract method refactorings: analysis based on interviews and change histories. Front Comput Sci 10(4):644–656 ArticleGoogle Scholar
- Mi Q, Keung J, Xiao Y, Mensah S, Gao Y (2018) Improving code readability classification using convolutional neural networks. Inf Softw Technol 104:60–71 ArticleGoogle Scholar
- Bishnu PS, Bhattacherjee V (2016) Software cost estimation based on modified K-Modes clustering Algorithm. Nat Comput 15(3):415–422 ArticleMathSciNetMATHGoogle Scholar
- Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38(2):375–397 ArticleGoogle Scholar
- Zavvar M, Yavari A, Mirhassannia SM, Nehi MR, Yanpi A, Zavvar MH (2017) Classification of risk in software development projects using support vector machine. J Telecommun Electron Comput Eng 9(1):1–5 Google Scholar
- Gilal AR, Jaafar J, Capretz LF, Omar M, Basri S, Aziz IA (2018) Finding an effective classification technique to develop a software team composition model. J Softw Evolut Process 30(1):1–12 Google Scholar
- Shawky DM, Abd-El-Hafiz SK (2016) Characterizing software development method using metrics. J Softw Evolut Process 28(2):82–96 Google Scholar
- Mauša G, Galinac Grbac T (2017) Co-evolutionary multi-population genetic programming for classification in software defect prediction: an empirical case study. Appl Soft Comput 55:331–351 ArticleGoogle Scholar
- Iwata K, Nakashima T, Anan Y, Ishii N (2017) Machine learning classification to effort estimation for embedded software development projects. Int J Softw Innov 5(4):19–32 ArticleGoogle Scholar
- Werner CM, Berry DM (2017) An empirical study of the software development process, including its requirements engineering, at very large organization: how to use data mining in such a study. In: Proceedings of 4th symposium on Asia-Pacific requirements engineering symposium (APRES 2017), Melaka, Malaysia, 9–10 Nov 2017. Communications in Computer and Information Science, vol 809, pp 15–25 Google Scholar
- Pandey N, Sanyal DK, Hudait A, Sen A (2017) Automated classification of software issue reports using machine learning techniques: an empirical study. Innov Syst Softw Eng 13(4):279–297 ArticleGoogle Scholar
- Chaudhary P, Singh D, Sharma A (2016) Classification of software project risk factors using machine learning approach. In: Intelligent systems technologies and applications, pp 297–309 Google Scholar
- Scalabrino S, Linares-Vásquez M, Oliveto R, Poshyvanyk D (2018) A comprehensive model for code readability. J Softw Evolut Process 30(6):1–23 Google Scholar
- Hussain S, Keung J, Sohail MK, Khan AA, Ilahi M (2019) Automated framework for classification and selection of software design patterns. Appl Soft Comput 75:1–20 ArticleGoogle Scholar
- Ackermann C, Cleaveland R, Huang S, Ray A, Shelton C, Latronico E (2010) Automatic requirement extraction from test cases. In: Barringer H et al (eds) Runtime verification (RV 2010), vol 6418. Lecture notes in computer science. Springer, Berlin, pp 1–15 ChapterGoogle Scholar
- Sartipi K, Safyallah H (2010) Dynamic knowledge extraction from software systems using sequential pattern mining. Int J Softw Eng Knowl Eng 20(6):761–782 ArticleGoogle Scholar
- Lu H, Wang L, Ye M, Yan K, Jin Q (2018) DNN-based image classification for software GUI testing. In: Proceedings of IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation, 8–12 Oct 2018, Guangzhou, China, pp 1818–1823 Google Scholar
- Jiang Y, Huang J, Ding J, Liu Y (2014) Method of fault detection in cloud computing systems. Int J Grid Distrib Comput 7(3):205–212 ArticleGoogle Scholar
- Okumoto K, Asthana A, Mijumbi R (2017) BRACE: cloud-based software reliability assurance. In: Proceedings of IEEE 28th international symposium on software reliability engineering workshops, Toulouse, France, 23–26 Oct 2017 Google Scholar
- Ali MM, Huda S, Abawajy J, Alyahya S, Al-Dossari H, Yearwood J (2017) A parallel framework for software defect detection and metric selection on cloud computing. Cluster Comput 20(3):2267–2281 ArticleGoogle Scholar
- Baitharu TR, Pani SK (2013) A survey on application of machine learning algorithms on data mining. Int J Innov Technol Explor Eng 3(7), December 2013 Google Scholar
- Halkidi M, Spinellis D, Tsatsaronis G, Vazirgiannis M (2011) Data mining in software engineering. Intell Data Anal 15:413–441 ArticleGoogle Scholar
- Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142 Google Scholar
- Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518 ArticleGoogle Scholar
- Azeem MI, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol 108:115–138 ArticleGoogle Scholar
- Koru G, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of 3rd international workshop on predictor models in software engineering (PROMISE’07: ICSE Workshops 2007), Minneapolis, MN, USA, 20–26 May 2007 Google Scholar
- Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: software metrics vs text mining. In: Proceedings of IEEE 25th international symposium on software reliability engineering (ISSRE), Naples, Italy, pp 23–33, 3–6 Nov 2014 Google Scholar
- Rees-Jones M, Martin M, Menzies T (2017) Better predictors for issue lifetime CoRR abs/170207735 Google Scholar
- Dorn J (2012) A general software readability model Master’s Thesis. University of Virginia, Department of Computer Science. Accessed 12 Apr 2019. http://www.cs.virginia.edu/%jad5ju/publications/dorn-mcs-paper.pdf
Author information
Authors and Affiliations
- Department of Computer Engineering, Dokuz Eylul University, Izmir, Turkey Kokten Ulas Birant & Derya Birant
- Kokten Ulas Birant