| S.No | Project Code | Project Title | Abstract |
|---|---|---|---|
Artifical Intelligence |
|||
| 1 | VTPAI01 | A Fine-Grained Object Detection Model For Aerial Images Based On Yolov10 Deep Neural Network | |
| 2 | VTPAI02 | A Comprehensive Dataset and Evaluation of Deep Learning Techniques for Pedestrian Head Detection in Crowds | |
| 3 | VTPAI03 | Dual Detection of License Plates and Helmets Using an Optimized YOLO and Neural Networks | |
| 4 | VTPAI04 | Enhanced Framework for Real-Time Vehicle Detection and Tracking | |
| 5 | VTPAI05 | Short-Term Air Temperature Forecasting Using LSTM and XGBRegressor | |
| 6 | VTPAI06 | A Comprehensive Benchmark Dataset for Traffic Accident Detection Using Video Surveillance | |
| 7 | VTPAI07 | YOLOv10-Driven Enhanced Vehicle Detection in Low-Light On-Board Environments | |
| 8 | VTPAI08 | A Computationally Efficient Deep Learning Approach for Localization and Classification of Diseases and Pests in Coffee Leaves | |
| 9 | VTPAI09 | Enhancing Fire Detection with YOLOv10: Advanced Techniques for Flame and Smoke Recognition | |
| 10 | VTPAI10 | Music Emotion Classification with Neural Network Architecture and Librosa | |
| 11 | VTPAI11 | Fake Face Detection Based on Videos using opencv and Neural Network Architecture | |
| 12 | VTPAI12 | Detecting the Small Object Recognition by Drone Images using yolov10 | |
| 13 | VTPAI13 | Real-Time Infant Emotion Recognition via Optimized YOLOv10 Model | |
| 14 | VTPAI14 | Lightweight YOLO-Based Model with Hybrid Attention for Surgical Instrument Recognition | |
| 15 | VTPAI15 | Automated Detection and Classification of Tooth Types and Dental Anomalies in Panoramic Radiographs | |
| 16 | VTPAI16 | UnderWaterNet: Efficient Visual Detection of Marine Garbage for Eco Monitoring | |
| 17 | VTPAI17 | Edge-Ready Road Damage Detection Using an Enhanced YOLO with Hyperparameter Tuning | |
| 18 | VTPAI18 | Enhanced Helmet Detection in Complex Industrial Environments Using an Improved YOLO-Based Model | |
| 19 | VTPAI19 | Efficient Railway Foreign Object Detection Using Enhanced YOLO with EfficientNet Backbone and Attention Mechanisms | |
| 20 | VTPAI20 | MF-YOLO: Mask Wearing Detection Algorithm for Dense Environments | |
| 21 | VTPAI21 | Advanced Surveillance with YOLOv10: Fusion-Based Detection of Threatening Objects | |
| 22 | VTPAI22 | Smart Surveillance for Fall Detection with YOLOv10 in Unstructured Outdoor Settings | |
| 23 | VTPAI23 | Enhancing Precision Agriculture Pest Control: A YOLOv10-Based Deep Learning Approach for Insect Detection | |
| 24 | VTPAI24 | Enhanced YOLO for Real-Time Multi-Scale Traffic Detection under Haze Conditions | |
Natural Language Processing |
|||
|---|---|---|---|
| 1 | VTPNLP01 | Assessing the Psychological Impact of Internet Blackouts | |
| 2 | VTPNLP02 | Sentiment Analysis for Cyberbullying Detection Using NLP and LSTM | |
| 3 | VTPNLP03 | Optimizing Mobile App Recommendations Using Crowdsourced Educational Data | |
| 4 | VTPNLP04 | Combining Sentiment Analysis and High-Dimensional Indicators for Bitcoin Price Range Prediction with LSTM | |
| 5 | VTPNLP05 | Enhancing User Feedback Analysis with Review Text Granularity for Better Sentiment and Rating Prediction | |
| 6 | VTPNLP06 | Integrated Emotion and Sentiment Analysis Using Multi-Modal Data | |
| 7 | VTPNLP07 | Leveraging Emotional Traces for Automatic Identification of Suicidal Ideation in Text | |
| 8 | VTPNLP08 | Intelligent Paraphrase Recognition Using Advanced NLP Techniques | |
| 9 | VTPNLP09 | Enhanced Neural Text Summarization with Syntactic and Headline Insights | |
| 10 | VTPNLP10 | Enhancing Agricultural Decision-Making with Combined Language Models | |
| 11 | VTPNLP11 | Stream lining News Topic Classification: A Deep Learning Approach with a Global News Dataset | |
| 12 | VTPNLP12 | A Deep Neural Network Approach for Classifying Pulmonary Diseases from Respiratory Sounds | |
| 13 | VTPNLP13 | English Visual Question Answering: Building a Culturally Relevant Dataset from Image Captions | |
| 14 | VTPNLP14 | Deep NLP Techniques for Tweets in Fake News Detection Systems | |
| 15 | VTPNLP15 | Enhanced Stock Price Prediction Across Global Markets Through Data-Driven Modeling | |
| 16 | VTPNLP16 | A Deep Transfer Learning Framework for Multi-Platform Sentiment Prediction | |
| 17 | VTPNLP17 | Explainable Detection of Depression in Social Media Contents Using Natural Language Processing | |
| 18 | VTPNLP18 | An Enhanced RNN-LSTM Model For Accurate And Real-Time Click Fraud Detection In Online Advertising | |
| 19 | VTPNLP19 | Contextual Multi-Modal Deep Learning for Bangla Sarcasm and Humor Detection | |
| 20 | VTPNLP20 | KEGAT: A Knowledge-Enhanced Graph-Aware Transformer for Detecting AI-Generated Fake News | |
| 21 | VTPNLP21 | Multiclass Mental Illness Prediction Using LSTM and Natural Language Processing Techniques | |
| 22 | VTPNLP22 | A Semantic Weight Adaptive Model Based on Visual Question Answering | |
| 23 | VTPNLP23 | Deep Learning Advancements in Video Summarization Innovations | |
Image Processing |
|||
|---|---|---|---|
| 1 | VTPIP01 | Advanced Feature Extraction and Transformation Method for Pneumonia Detection in Chest X-Ray Images | |
| 2 | VTPIP02 | A Novel Neural Network Architecture for Facial Emotion Recognition | |
| 3 | VTPIP03 | A Local-Global Adapter-Based Method for Sketch Face Recognition | |
| 4 | VTPIP04 | Deep Learning-Based Automated Defect Detection in Solar Cell Images | |
| 5 | VTPIP05 | Gender Identification from Pashto Handwritten Text Using Neural Network Architecture Designs | |
| 6 | VTPIP06 | Acoustic Field Reconstruction Using Iterative Unsupervised Learning in Acoustic Holography | |
| 7 | VTPIP07 | Hierarchical Deep Learning for Enhanced Parkinson's Disease Detection via Handwriting Analysis | |
| 8 | VTPIP08 | Multimodal Ensemble Fusion Deep Learning for Brain Tumor Subtype Classification Using MRI Images and Clinical Data | |
| 9 | VTPIP09 | CrackVision: Sophisticated Concrete Crack Identification Through Transfer Learning and Deep Learning | |
| 10 | VTPIP10 | Towards Efficient Solar Panel Fault Detection Through Neural Network Modeling | |
| 11 | VTPIP11 | Automated Chromosome Identification in Metaphase Cells Using Deep Neural Networks | |
| 12 | VTPIP12 | Enhancing Tuberculosis Detection in Chest X-Rays Using Deep Learning and Image Preprocessing | |
| 13 | VTPIP13 | Self-Segmentation Guided Diffusion for Thermal to Pseudo-Color Image Translation | |
| 14 | VTPIP14 | Image Enhancement and Target-Aware Fusion of Infrared and Visible Images | |
| 15 | VTPIP15 | A Deep Learning Approach to Lung Nodule Analysis Using Attention-Infused Xception | |
| 16 | VTPIP16 | Few-Shot Semantic Segmentation of Aerial Images Using U-net and Efficient-Net | |
| 17 | VTPIP17 | Dehazing Aerial Drone Images via Regional Saturation-Value Mapping and U-Net-Driven Soft Segmentation | |
| 18 | VTPIP18 | Guided Image Channel Selection in Medical Image Processing Using SegNet for Skin Lesions | |
| 19 | VTPIP19 | Comparative Analysis of CNN Architectures for Malaria Detection in Blood Cell Images | |
| 20 | VTPIP20 | Smart Human Action Monitoring Using RGB and Motion Signals | |
| 21 | VTPIP21 | A Generalized Approach for FOV Mask Segmentation in Fundus Retinal Imaging | |
| 22 | VTPIP22 | An Optimized Wheat Disease Detection Framework Using YOLOv10 with C2f-DCN and SCNet | |
| 23 | VTPIP23 | A Dual-Stage Framework for Cavity Detection in Nuclear Materials Using ESRGAN and Swin-UNet | |
| 24 | VTPIP24 | Robust Zero-Watermarking of Medical Images Using Deep CNNBased Feature Extraction for Secure Copyright Protection | |
Machine Learning |
|||
|---|---|---|---|
| 1 | VTPML01 | Explainable Machine Learning for Obesity Risk Classification Using a Stacked Ensemble with LIME Interpretability | |
| 2 | VTPML02 | Using Optimal Machine Learning Algorithms to Predict Heart Failure Patient Classification | |
| 3 | VTPML03 | Predictive Modeling for Early Lung Cancer Detection Using CTGANAugmented Features and Tree-Based Learning Techniques | |
| 4 | VTPML04 | A Machine Learning Framework for Monthly Crude Oil Price Prediction with CatBoost | |
| 5 | VTPML05 | Intelligent Sports Team Management Powered by Machine Learning | |
| 5 | VTPML06 | Dynamic Ransomware Detection using time-based API calling | |
| 6 | VTPML07 | Machine Learning-Based Fault Diagnosis of Rolling Bearings Using Spectrogram Zeros Under Variable Rotating Speeds | |
| 7 | VTPML08 | Enhancing Crop Recommendations Using Advanced Deep Belief Networks: A Multimodal Strategy | |
| 8 | VTPML09 | Intelligent Network Traffic Anomaly Detection Using ML Algorithms | |
| 9 | VTPML10 | Intelligent Psychological Support System for Student Entrepreneurship | |
| 10 | VTPML11 | Machine Learning Models for Regional Life Expectancy Forecasting | |
| 11 | VTPML12 | Machine Learning Approach for Predicting Parkinson's Disease at Early Stages | |
| 12 | VTPML13 | Enhancing Hospitality Management Through ML-Based Cancellation Prediction | |
| 13 | VTPML14 | Improving Port Throughput via MLBased Ship Waiting Time Prediction | |
| 14 | VTPML15 | Machine Learning-Based Fault Detection in Photovoltaic Systems | |
| 15 | VTPML16 | Pain Level Classification Using Discrete Wavelet Transform-Based Feature Extraction and Machine Learning Approaches | |
| 17 | VTPML17 | Enhanced Credit Risk Prediction Using Ensemble Learning with Data Resampling Techniques | |
| 18 | VTPML18 | Machine Learning based Method for Insurance Fraud Detection on Class Imbalance Datasets with Missing Values | |
| 19 | VTPML19 | Trustworthy Predictions:An Explainable AI Approach to Breast Cancer Diagnosis | |
| 20 | VTPML20 | Machine Learning-Driven Real-Time Battery Health Estimation for EV Battery Swapping | |
| 21 | VTPML21 | Improving Fetal Health Classification Accuracy Using machine learning and Active Sampling | |
| 22 | VTPML22 | Power Load Forecasting Using Deep MLP Model with Multivariate Meteorological Features | |
| 23 | VTPML23 | Enhancing Medicare Fraud Detection Through Machine Learning | |
| 24 | VTPML24 | Sleep Apnea Detection Using Extreme Gradient Boosting on Engineered Physiological Signal Features | |
Deep Learning |
|||
|---|---|---|---|
| 1 | VTPDL01 | A Scalable Image-Based Framework for Detecting and Monitoring Rice Leaf Diseases | |
| 2 | VTPDL02 | A Deep Learning Approach to the Recognition of Handwritting | |
| 3 | VTPDL03 | Advancing Kidney Tumor Detection in CT Scans with a Hybrid Computational Framework | |
| 4 | VTPDL04 | Enhancing Credit Card Fraud Detection in Banking Using Graph Neural Networks and Autoencoders | |
| 5 | VTPDL05 | Anomaly Detection in Industrial Machine Sounds Using High-Frequency Feature Analysis and Gated Recurrent Unit Networks | |
| 6 | VTPDL06 | Enhanced Melanoma Diagnosis Using ConvNeXt-Based Deep Learning Framework | |
| 7 | VTPDL07 | Attention-Driven Lightweight Network for Colorectal Cancer Classification | |
| 8 | VTPDL08 | Multi-Class Classification of Normal WBCs Using Convolutional Neural Networks | |
| 9 | VTPDL09 | An Interpretable Deep Learning Approach for Classifying Bean Leaf Diseases | |
| 10 | VTPDL10 | Early Identification of Severe Arrhythmias Using Deep Active Learning Techniques | |
| 11 | VTPDL11 | Echocardiographic Image Analysis for Heart Disease Detection via Deep Neural Networks | |
| 12 | VTPDL12 | Deep Learning Approaches for Accurate Lithological Mapping from Remote Sensing Imagery | |
| 13 | VTPDL13 | Ship Classification in Remote Sensing Images Using Deep Neural Networks | |
| 14 | VTPDL14 | Deep Neural Networks for Early Monkeypox Detection from Medical Images | |
| 15 | VTPDL15 | A Transparent Deep Learning Approach for Mango Leaf Disease Classification | |
| 16 | VTPDL16 | Automated Osteoporosis Identification in Bone X-ray Images Using Deep Feature Learning | |
| 17 | VTPDL17 | Robust Detection of Rotten Fruits Filtering for Food Waste Reduction | |
| 18 | VTPDL18 | Integrating Quantum Vision Theory with Deep Learning for Enhanced Object Recognition | |
| 19 | VTPDL19 | A Dual Approach: Machine vs Deep Learning for Predicting Ovarian Cancer in Early Stages | |
| 20 | VTPDL20 | Neuroimaging Meets AI: Deep Learning for Forecasting MCI in Cognitively Normal Subjects | |
| 21 | VTPDL21 | Optimizing Thyroid Nodule Diagnosis Through Deep Learning Algorithms | |
| 22 | VTPDL22 | Fingerprint Liveness Detection via Global Feature Encoding with Vision Transformers | |
| 23 | VTPDL23 | Predicting Adolescent Concern Toward Unhealthy Food Advertisements Using Deep Neural Networks with Feature Embeddings and Explainable AI | |
| 24 | VTPDL24 | Optimized Diabetic Foot Ulcer Classification Using NASNetLarge with Advanced Transfer Learning and Data Augmentation Techniques | |
This paper presents the implementation of a real-time visual tracking system equipped with an active pan-tilt camera, designed for indoor human motion detection . The system leverages the state-of-the-art YOLOv10 object detection model for accurate and efficient human detection in each video frame. YOLOv10’s high-speed inference and improved detection accuracy, even under challenging lighting and occlusion conditions, make it ideal for real-time applications. To ensure robustness, the system incorporates a multiple object tracking (MOT) framework that maintains a dynamic graph structure. This graph handles multiple hypotheses regarding the number and trajectories of detected individuals over time. Unlike conventional frame differencing methods, YOLOv10 enables frame-wise object detection with fewer false positives and missing detections. The MOT module performs temporal data association, validating and confirming YOLOv10’s frame-wise predictions, thereby achieving consistent tracking over time. This tight integration allows the tracker to provide feedback to the detection module, predicting object positions and improving overall tracking reliability. The process continuously extends and prunes tracking hypotheses, selecting the most likely explanation of the observed video. Experimental results demonstrate the system’s effectiveness in real-time human motion tracking scenarios, with significant improvements in detection precision and temporal consistency due to the integration of YOLOv10.
The automatic detection of pedestrian heads in crowded environments is essential for enhancing public safety and enabling efficient crowd management, particularly in sensitive areas such as railway platforms and event entrances. These environments, often characterized by high density, occlusion, and complex visual conditions, remain underrepresented in existing public datasets. To address this challenge, we introduce the Railway Platforms and Event Entrances-Heads (RPEE-Heads) dataset, a diverse collection of high-resolution images with carefully annotated head regions. Each annotation provides precise bounding boxes around visible pedestrian heads, facilitating the training and evaluation of advanced detection models. In this study, we employ the YOLOv8 architecture to demonstrate the effectiveness of the dataset for real-time head detection. The RPEE-Heads dataset serves as a valuable benchmark for advancing research in computer vision and surveillance, particularly in applications focused on safety-critical crowd analysis and monitoring.
This project presents an advanced computer vision system for real-time Safety Helmet Detection and License Plate Recognition using the proposed YOLOv10 object detection architecture. The primary objective is to enhance workplace safety and vehicle monitoring by automatically identifying individuals without safety helmets in industrial zones and capturing vehicle license plates for surveillance and regulatory enforcement. Building on the strengths of YOLOv8, the proposed YOLOv10 integrates Focal Loss and Hard Negative Mining to address the challenges of class imbalance and small object detection in complex industrial environments. Focal Loss focuses training on hard-to-classify samples, reducing the impact of easily classified negatives, while Hard Negative Mining improves detection accuracy by selectively backpropagating difficult negative examples. The system is trained on diverse annotated datasets containing various helmet types and vehicle plates under varying lighting and occlusion conditions to ensure robust performance. Wearing safety helmets significantly reduces the risk of head injuries for construction workers, especially in high-altitude environments. To overcome the limitations of existing safety helmet detection methods in detecting small targets and operating effectively in cluttered backgrounds, this study proposes an improved detection framework based on YOLOv10, optimized for both speed and precision in real-world deployment scenarios.
Urban traffic congestion is a major challenge in modern cities, leading to increased travel times, fuel consumption, and air pollution. To address this issue, DeepTraffic-VTS presents an intelligent hybrid system that integrates Long Short-Term Memory (LSTM) networks for time series forecasting with YOLO (You Only Look Once) for vehicle detection. The LSTM model analyzes historical traffic data—such as vehicle count, average speed, and congestion levels—to predict traffic conditions over upcoming time intervals. Simultaneously, YOLO processes live video feeds to detect and classify vehicle types on the road, including two-wheelers, cars, buses, and emergency vehicles. Based on both predicted and real-time traffic conditions, the system provides adaptive suggestions on the most suitable vehicle types for efficient navigation—for example, recommending two-wheelers in high-congestion zones due to their maneuverability, while advising larger vehicles to reroute or delay travel. This combined approach enables more effective traffic management, emergency response optimization, and smart urban mobility planning.
Air temperature plays a vital role in agriculture, influencing processes from planting to post-harvest management. Accurate temperature prediction can help prevent crop damage, improve production quality, and optimize resource use. In this study, we propose a novel air temperature prediction framework using Long Short-Term Memory (LSTM) networks and XGBoost Regressor, leveraging their complementary strengths in temporal sequence modeling and gradient-boosted regression. The LSTM network is employed to capture long-term dependencies and temporal patterns from historical air temperature data, making it suitable for sequential forecasting. By learning from hourly temperature sequences, the model can generate multi-step predictions for up to 24 hours ahead, effectively modeling dynamic trends and fluctuations in air temperature. The XGBoost Regressor is integrated to refine predictions by handling nonlinear relationships and interactions in the data, improving the overall robustness of the forecast. XGBoost’s gradient boosting architecture allows efficient feature weighting and error minimization, complementing the sequence learning capabilities of LSTM. This combined LSTM-XGBoost approach provides an effective and scalable solution for accurate air temperature prediction, requiring only historical temperature data. The framework can be applied in agricultural planning and management to support decision-making, optimize resource use, and mitigate potential crop losses due to adverse weather conditions.
The automatic detection of traffic accidents has become an essential focus area in computer vision, propelled by the rapid advancements in autonomous and intelligent transportation systems (ITS). To achieve reliable and real-time detection of accident scenarios, YOLOv10, the latest evolution in the YOLO family, offers a powerful and efficient framework for object detection in complex and dynamic traffic environments. Unlike traditional approaches that struggle with occlusions, varying lighting conditions, and high-speed vehicle motion, YOLOv10 provides superior detection accuracy through its optimized architecture, featuring re-parameterized convolutional layers, decoupled detection heads, and advanced feature fusion mechanisms. These enhancements enable the model to precisely identify accident-related events, such as collisions, overturned vehicles, or lane departures, from surveillance video streams
Accurate vehicle detection in low-light environments remains a critical challenge in the field of intelligent transportation systems and autonomous driving. On-board vision-based detection systems often suffer from degraded performance due to insufficient illumination, motion blur, glare from headlights, and high levels of visual noise. To address these issues, this work presents a YOLOv10-driven enhanced vehicle detection framework tailored for real-time deployment in low-light on-board environments. The proposed system leverages the advanced feature extraction and optimized architecture of YOLOv10, which provides significant improvements in detection accuracy, computational efficiency, and robustness compared to its predecessors. In the framework, pre-processing techniques such as adaptive histogram equalization, noise reduction, and contrast enhancement are integrated to improve image quality before feeding data into the detection pipeline. YOLOv10 is then fine-tuned with a diverse dataset of nighttime and low-illumination driving scenarios to enhance generalization under challenging conditions. The lightweight yet powerful design of YOLOv10 enables real-time inference on embedded systems, making it suitable for in-vehicle applications where processing speed and resource efficiency are crucial. Experimental evaluations demonstrate that the YOLOv10-based model achieves superior performance in terms of mean Average Precision (mAP), detection speed, and false-positive reduction when compared to baseline models such as YOLOv8 and Faster R-CNN. Additionally, the system exhibits resilience to common low-light issues such as shadow occlusion, glare from artificial lighting, and environmental noise, ensuring reliable detection across diverse nighttime driving conditions.
Coffee cultivation plays a vital role in the economy of many regions worldwide, yet its productivity is frequently threatened by leaf diseases and pests that negatively impact both yield and quality. To address this challenge, automated identification of coffee leaf diseases using deep learning offers an efficient alternative to manual inspection, enabling timely intervention and improved crop management. In this study, we propose a MobileNetV2-based framework that leverages lightweight yet powerful feature extraction for real-time disease recognition. The model is trained on a newly curated dataset designed to ensure class balance and robustness, thereby improving detection accuracy across diverse disease categories. This approach provides a practical and resource-efficient solution for monitoring coffee leaf health, supporting decision-making in plantation care and contributing to sustainable disease management.
This project introduces an advanced method for detecting flames and smoke using the YOLOv10 algorithm, aimed at improving fire detection systems for enhanced safety and prevention. The proposed system incorporates significant advancements to address challenges such as cluttered backgrounds, low visibility, varying fire intensities, and overlapping objects. By leveraging an enhanced feature extraction mechanism, the system captures finer details of flames and smoke. It also includes a sophisticated attention mechanism to prioritize critical fire-related areas in an image while suppressing irrelevant background information. To further enhance detection accuracy, the system uses an improved bounding box regression method, ensuring precise localization of flames and smoke, even in dense or challenging environments. These improvements make the system more robust, adaptable, and capable of handling high-resolution images and videos efficiently. Experimental results demonstrate that the YOLOv10-based system significantly outperforms previous methods, such as YOLOv5s, in terms of precision, recall, and mean average precision (mAP). Additionally, the system processes data in real-time, making it suitable for large-scale applications in industries, public spaces, and residential areas. By ensuring early and reliable detection of flames and smoke, the proposed system offers a practical solution for improving fire safety and prevention measures.
The classification of musical emotions is essential for organizing, searching, and recommending music on modern platforms. Traditional models often rely on raw audio or textual features, which may not fully capture the rich emotional content embedded in music. To address this, we propose a Convolutional Neural Network (CNN)-based model combined with Librosa for feature extraction to classify musical emotions effectively. In the proposed approach, Librosa is used to extract meaningful audio features from music signals, including Mel-frequency cepstral coefficients (MFCCs), chroma features, spectral contrast, and tonettes representations. These features provide a compact and informative representation of the audio, capturing timbral, harmonic, and rhythmic characteristics relevant to emotion recognition. The CNN model is then applied to learn hierarchical patterns from these extracted features. Convolutional layers automatically capture local correlations in the audio features, while pooling layers reduce dimensionality and highlight dominant emotional patterns. This deep learning framework eliminates the need for handcrafted feature combinations, allowing the model to generalize effectively across diverse music samples. By combining Librosa feature extraction with the pattern learning capability of CNNs, the proposed system is able to capture complex emotional relationships in music. This approach offers a robust and scalable solution for automated music emotion classification, supporting applications such as music recommendation, playlist generation, and music analytics in real-world platforms.
The rapid development of the Internet has enabled the widespread distribution of manipulated facial images, particularly Deepfakes, which are increasingly difficult to detect using conventional methods. While current approaches focus on spatial domain features or complex network architectures, they often lack robustness against sophisticated forgery techniques. To address this, we propose a MobileNetV2-based Deepfake detection framework that leverages efficient convolutional feature extraction for accurate classification of real and fake facial images. The framework begins with OpenCV-based preprocessing, including face detection, alignment, and normalization, to ensure consistent input quality and enhance the discriminative features for detection. MobileNetV2, a lightweight yet powerful convolutional neural network, is employed to automatically learn hierarchical spatial features from the preprocessed facial images, eliminating the need for handcrafted features. By combining OpenCV preprocessing with MobileNetV2, the proposed system effectively captures subtle visual artifacts and texture inconsistencies introduced by Deepfake manipulation. This approach enables robust and scalable detection, generalizing well across diverse datasets and real-world scenarios, providing a practical solution for automated Deepfake detection in security, media verification, and social media monitoring applications.
UAV imagery has become an essential tool in applications such as traffic monitoring, disaster response, and airspace management, owing to its flexibility, portability, and low operational cost. However, object detection in UAV images poses significant challenges due to factors like small object sizes, complex and cluttered backgrounds, and high levels of noise. To overcome these challenges, this study proposes an advanced object detection approach based on YOLOv10, a state-of-the-art model known for its enhanced architectural efficiency and detection capabilities. The model is optimized for UAV aerial scenarios, with a particular focus on improving small object detection through refined feature extraction and enhanced spatial understanding. The proposed YOLOv10-based framework integrates adaptive feature enhancement and deep semantic learning to improve detection performance under challenging UAV imaging conditions. By leveraging modern advancements in convolutional attention mechanisms, multi-scale detection heads, and optimized backbone architectures, the system effectively captures fine-grained details while maintaining real-time processing capabilities. This approach enables robust object detection in complex UAV environments and demonstrates the potential of YOLOv10 as a powerful solution for aerial imagery analysis.
Infant facial expression recognition plays a critical role in early childhood care, enabling timely responses to an infant's emotional and physical needs. This study presents an enhanced facial expression recognition system based on the latest YOLOv10 architecture to classify subtle and fine-grained infant emotions. Unlike previous models, YOLOv10 offers improved efficiency, precision, and real-time detection capabilities due to its lightweight structure and advanced feature extraction strategies. The proposed system leverages YOLOv10’s enhanced backbone and head modules to more effectively capture and process multi-scale features and subtle facial variations. A curated dataset of labeled infant images—categorized as Cry, Happy, Neutral, and Back of Head—was used to train and validate the model. Experimental results demonstrate that the YOLOv10-based model significantly outperforms traditional YOLOv8 and other conventional facial expression recognition approaches in terms of both accuracy and real-time performance. These results highlight the model’s suitability for practical deployment in intelligent care systems, offering high precision and recall in detecting complex infant facial expressions.
Minimally invasive surgeries such as laparoscopy require precise and reliable detection of surgical instruments to support computer-aided surgery, surgical navigation, and post-operative analysis. Traditional object detection algorithms, including earlier YOLO versions, often struggle with challenges such as occlusion, low contrast, and the thin structures of laparoscopic tools. To overcome these issues, this project proposes an enhanced real-time surgical instrument detection system using YOLOv10, the latest advancement in the YOLO family of object detection models. The proposed system is trained on laparoscopic surgical datasets containing both images and videos, enabling accurate recognition of instruments under complex surgical environments. YOLOv10 integrates anchor-free detection, hybrid task balancing, and efficient decoupled heads, providing superior accuracy while maintaining lightweight computation suitable for real-time applications. The system achieves improved detection of small and overlapping surgical instruments compared to existing YOLO models, making it highly effective for live surgical assistance. This project demonstrates how YOLOv10 can be applied in the medical field to enhance surgical safety, assist surgeons in real time, and provide a robust foundation for intelligent computer-aided surgical systems. The outcome includes a trained model capable of processing both images and video streams, integrated into a user-friendly Flask application for practical deployment.
Marine pollution poses a severe threat to the sustainability of aquatic ecosystems and the blue economy. Effective detection and classification of underwater debris are crucial for enabling timely interventions and supporting marine conservation efforts. In this project, we present an advanced underwater garbage detection system based on YOLOv10n, a cutting-edge, lightweight object detection model optimized for resource-constrained IoT and underwater robotic platforms. Building on the challenges identified in traditional detection models—such as high computational costs and deployment complexity—we replace older backbones like CSPDarknet with the more efficient YOLOv10n architecture. YOLOv10n is designed with an emphasis on speed, low parameter count, and high accuracy, making it ideal for real-time underwater applications. Our system achieves robust debris detection with high precision, while significantly reducing memory and processing requirements, thereby facilitating deployment on embedded and mobile devices. This project demonstrates the feasibility and effectiveness of using YOLOv10n for scalable and eco-friendly marine monitoring solutions, providing a practical approach to combat marine pollution through intelligent automation.
Efficient and accurate road damage detection is critical for the development of smart cities and maintaining safe transportation infrastructure. Manual inspection methods are slow, labor-intensive, and prone to errors, making automated detection necessary. In this project, we propose a YOLOv10n-based framework for real-time road damage detection optimized for edge devices. The system leverages advanced hyperparameter tuning to improve model performance while maintaining low computational requirements. The framework achieves high accuracy, with a precision of 0.986, recall of 0.973, mean average precision (mAP@0.5) of 0.988, and F1-score of 0.978. Deployment on NVIDIA Jetson Nano demonstrates an inference time of 0.13 seconds per frame at 7.5 FPS, while NVIDIA AGX Orin achieves 0.014 seconds per frame at 67 FPS, highlighting scalability and efficiency. This study demonstrates that YOLOv10n is highly effective for real-time, edge-based road damage recognition, enabling faster maintenance decisions and improved road safety.
Wearing safety helmets can effectively reduce the risk of head injuries for construction workers in high-altitude falls. In order to address the low detection accuracy of existing safety helmet detection algorithms for small targets and complex environments in various scenes, this study proposes an improved safety helmet detection algorithm based on YOLOv8, named YOLOv8n. For data augmentation, the mosaic data augmentation method is employed, which generates many tiny targets. In the backbone network, a coordinate attention (CA) mechanism is added to enhance the focus on safety helmet regions in complex backgrounds, suppress irrelevant feature interference, and improve detection accuracy. In the neck network, a slim-neck structure fuses features of different sizes extracted by the backbone network, reducing model complexity while maintaining accuracy. In the detection layer, a small target detection layer is added to enhance the algorithm’s learning ability for crowded small targets. Experimental results indicate that, through these algorithm improvements, the detection performance of the algorithm has been enhanced not only in general scenarios of real-world applicability but also in complex backgrounds and for small targets at long distances. Compared to the YOLOv8n algorithm, YOLOv8n in precision, recall, mAP50, and mAP50-95 metrics, respectively. Additionally, YOLOv8n-SLIM-CA reduces the model parameters by 6.98% and the computational load by 9.76%. It is capable of real-time and accurate detection of safety helmet wear. Comparison with other mainstream object detection algorithms validates the effectiveness and superiority of this method.
Ensuring the timely and accurate detection of foreign objects on railway tracks is vital for maintaining the safety and reliability of rail transport systems. This study presents an enhanced foreign object intrusion detection framework that addresses the limitations of existing methods—namely low efficiency and suboptimal accuracy—by integrating a two-stage architecture based on YOLOv8 and Overhaul Knowledge Distillation (OKD). In the first stage, a lightweight image classification model rapidly filters railway images to identify those potentially containing foreign objects, reducing the computational burden on detection models. Images flagged as suspicious are then passed to the second stage, where the YOLOv8 object detector precisely localizes and identifies the foreign objects. The use of YOLOv8 offers significant gains in both detection accuracy and inference speed over its predecessors. To further boost the performance of the classification stage, the Overhaul Knowledge Distillation technique is employed, allowing the lightweight classifier to learn from a more complex teacher network and achieve competitive accuracy with improved efficiency. Experimental evaluations confirm that the proposed approach outperforms existing solutions in both speed and robustness, establishing a new state-of-the-art in railway foreign object detection.
The widespread use of face masks plays a vital role in minimizing the spread of infectious diseases, making reliable mask-wearing compliance detection a crucial task. This project proposes An Enhanced YOLOv9 Framework for Detecting Mask-Wearing Compliance, designed to achieve high accuracy and robustness in diverse environments. Leveraging the powerful YOLOv9 architecture, the framework integrates optimization strategies to improve small-object detection, reduce false positives, and enhance feature learning in crowded or complex scenarios. The model is trained on a benchmark Mask-PPE dataset, enabling it to classify individuals into categories such as mask, no mask, and improper mask usage. Experimental evaluations demonstrate that the enhanced YOLOv9 framework achieves superior detection performance compared to conventional approaches, ensuring efficient monitoring in real-world applications such as healthcare facilities, workplaces, and public spaces. This system provides a scalable solution for supporting public safety initiatives through intelligent surveillance technologies.
The rapid evolution of intelligent surveillance systems has led to the integration of deep learning-based object detection models for enhanced situational awareness and security monitoring. This study presents an Advanced Surveillance Framework utilizing YOLOv10, a next-generation real-time object detection algorithm, for the fusion-based detection of threatening objects such as weapons, explosives, and suspicious items in complex environments. The proposed system combines multi-sensor data fusion, integrating visual and infrared modalities to improve detection accuracy under varying lighting and occlusion conditions. YOLOv10’s optimized architecture offers superior speed–accuracy trade-offs through improved feature aggregation, adaptive anchor mechanisms, and transformer-based attention modules. Experimental results demonstrate that the fusion-based YOLOv10 model significantly outperforms traditional single-sensor approaches in precision, recall, and real-time responsiveness, making it a powerful solution for modern surveillance applications in public safety, border control, and smart city security networks.
Falls are one of the most frequent and hazardous incidents occurring in industrial and open environments, posing serious threats to individual safety. Prompt and accurate fall detection remains a critical challenge, especially when deploying models on edge or low-power devices. To address these issues, this paper proposes YOLOv10-Fall, a robust and efficient deep learning-based framework for real-time fall detection in open spaces. Leveraging the cutting-edge YOLOv10 architecture, the proposed method enhances detection performance through improved spatial and contextual feature representation while significantly reducing computational overhead. YOLOv10-Fall integrates a lightweight yet powerful backbone and a streamlined detection head to facilitate faster inference and higher precision in complex scenes. Experimental evaluations on benchmark fall datasets demonstrate that YOLOv10-Fall achieves superior detection accuracy and mAP compared to previous models like YOLOv7-tiny, while also offering improved inference speed and reduced parameter complexity. These advancements make YOLOv10-Fall a practical and scalable solution for deployment in real-world surveillance and safety monitoring systems.
Precision Agriculture (PA) leverages advanced technologies to optimize resource use while preserving crop quality and yield. However, pest infestations remain a critical challenge that can undermine these benefits. Recent deep learning frameworks like YOLOv8 have shown promise in real-time insect detection, yet often remain limited to specific insect types or crops. To address this limitation and improve detection accuracy, this work explores an enhanced, generalized approach using the latest YOLOv10 object detection model. We develop and test a YOLOv10-based tool designed to detect any insect category across diverse crops, enabling broader and faster pest monitoring in the field. A comprehensive performance evaluation was conducted on a benchmark insect dataset, demonstrating notable improvements over YOLOv8, including higher mean Average Precision (mAP) scores and faster inference speeds. The findings suggest that YOLOv10's architectural advancements contribute to more robust, scalable, and real-time pest detection, offering significant potential to strengthen pest management strategies within precision agriculture.
In road traffic safety systems, computer vision-based object detection faces significant challenges under hazy weather conditions, including severe scale variations, high background noise, and complex viewing angles. To address these issues, we propose an improved YOLO-based detection algorithm named Proposed v11m, building upon the existing YOLOv11n framework. Our approach integrates a novel attention-gate convolution (AGConv) module into the backbone, replacing the original bottleneck to strengthen contextual feature extraction and reduce redundant computations. Furthermore, we introduce a multi-dilation sharing convolution (MDSC) module to mitigate feature loss during pooling and enhance sensitivity to objects of varying scales. To further improve detection accuracy and efficiency, we design a lightweight cross-channel feature fusion module (CCFM) within the neck network, which dynamically recalibrates feature weights for better multi-scale representation. Experimental evaluation shows that the Proposed v11m model achieves a 1.1% increase in mAP@0.5 and a 2.7% boost in mAP@0.5:0.95 over YOLOv11n, while maintaining real-time performance of 376 FPS with only 2.6 million parameters. The results demonstrate that Proposed v11m offers high-precision, efficient traffic object detection suitable for deployment on resource-limited devices, even in adverse weather conditions.
Pneumonia is a severe inflammation of the lungs caused by pathogens or autoimmune diseases, affecting approximately 450 million individuals worldwide each year. Chest X-ray analysis remains the primary diagnostic method, but manual interpretation can be time-consuming and prone to error. With the advancement of deep learning, automated systems are increasingly being adopted to assist in medical image analysis. This paper investigates the use of MobileNetV2, a lightweight yet powerful convolutional neural network, for the detection and classification of pneumonia from chest X-ray images. MobileNetV2 leverages depthwise separable convolutions and an inverted residual structure, enabling efficient computation with reduced memory requirements while maintaining high accuracy. Our framework processes chest X-rays to distinguish between pneumonia and normal cases, ensuring suitability for real-time and resource-constrained clinical environments. Experimental results demonstrate that MobileNetV2 achieves competitive accuracy, precision, recall, and F1-score compared to heavier architectures, making it a reliable and scalable solution for pneumonia diagnosis. The findings highlight the potential of MobileNetV2 in providing rapid, accurate, and cost-effective diagnostic support for healthcare professionals.
Facial emotion recognition (FER) plays a vital role across diverse domains such as e-learning, marketing, humanoid robot interaction, HMI/HCI systems, and medical diagnostics. With the growing advancement of intelligent systems, there is a continuous effort to enhance the performance and accuracy of FER techniques. Traditional machine learning methods, including Random Forest (RF) and its variants, have been employed for emotion classification, but they often struggle with generalization, especially on diverse or complex facial datasets. To address these limitations, this study proposes a deep learning-based approach using the MobileNetV2 architecture, a lightweight yet efficient convolutional neural network widely adopted for mobile and real-time applications. The proposed MobileNetV2 model is fine-tuned for FER tasks to classify six basic emotions—sadness, anger, fear, surprise, disgust, and happiness—using facial images. Unlike conventional ML methods, MobileNetV2 automatically extracts and learns hierarchical features from facial data, eliminating the need for handcrafted features or manual partitioning strategies. This model demonstrates superior adaptability to varied image complexities while maintaining computational efficiency, making it well-suited for real-time emotion recognition on resource-constrained devices. By leveraging transfer learning and regularization techniques, the proposed MobileNetV2-based framework significantly improves emotion classification performance, providing a robust and accurate solution for real-world FER challenges.
The objective of sketch-based face identification is to match a target individual’s facial features from a collection of photographs using a sketched portrait as the search query. Existing methods face challenges due to limited dataset sizes, which can lead to overfitting when training large models. To overcome this, we propose a novel sketch face recognition approach that effectively captures both local facial details and global contextual information. The method segments facial features into localized regions to enhance detailed feature extraction, and further refines these representations using a specialized network module designed to enrich the feature space. Additionally, a two-stage training strategy is employed to optimize different components of the model separately, improving the overall quality of visual features. Experimental results demonstrate that this approach achieves superior performance compared to existing methods on several benchmark sketch face datasets, highlighting its potential for reliable and accurate sketch-based face recognition.
This research presents an automated deep learning-based approach for detecting defects in solar cell images. The study focuses on evaluating a wide range of deep learning models to classify solar cells as either defective or non-defective. It emphasizes both high-performance and lightweight architectures suitable for deployment in resource-constrained environments. Using a balanced and well-curated dataset of solar cell images, the proposed approach aims to enhance the efficiency and reliability of quality control processes in solar energy production. By comparing various deep learning models, the study highlights the potential of AI-powered visual inspection systems to streamline defect detection, reduce manual effort, and improve overall production quality in the solar energy industry.
Computer Vision (CV) is a branch of computer science that empowers machines to analyze and interpret visual data. It integrates image processing, analysis, and machine learning techniques to derive meaningful insights from images and videos. A significant challenge in this domain is the classification of gender from handwritten text images, particularly in low-resource languages like Pashto, which feature complex scripts. This research introduces a novel method for gender classification using Pashto Handwritten Text Images (PHTI), leveraging the deep learning architecture MobileNetV2. The PHTI dataset comprises 36,086 handwritten text line images contributed by 200 male and 200 female native Pashto writers. MobileNetV2, known for its lightweight design and efficient feature extraction, is particularly suitable for tasks involving large-scale image data while maintaining computational efficiency. Its use of depthwise separable convolutions and inverted residuals enables effective learning of intricate handwriting patterns, even in complex scripts like Pashto. This study also explores the performance of a traditional Support Vector Machine (SVM) in comparison to MobileNetV2 for gender classification. While SVM relies on handcrafted features and classical machine learning approaches, MobileNetV2 leverages hierarchical feature representations automatically learned from data, making it more adept at handling the variability and complexity inherent in handwritten text. The findings highlight the potential of MobileNetV2 in addressing challenges of handwritten recognition in underrepresented languages and pave the way for intelligent systems capable of processing diverse scripts in real-world applications.
The ability to reconstruct acoustic fields is essential for applications such as particle manipulation, medical therapy, and ultrasonic imaging. Traditional acoustic field reconstruction methods often struggle with limitations in both speed and accuracy, which restricts their practical deployment in generating acoustic holograms. To address these challenges, this study presents an advanced reconstruction approach that combines physical modeling with a deep learning framework. The system incorporates an acoustic lens integrated with a transducer, leveraging variations in lens thickness and acoustic velocity to create precise phase distributions. To minimize dependence on labeled data, an unsupervised learning strategy is employed through a neural network architecture designed to learn representations directly from raw input. Additionally, a specialized loss function is introduced to promote energy efficiency during the reconstruction process. Comparative evaluations demonstrate the effectiveness of the proposed method in improving both the quality of acoustic holograms and the efficiency of the reconstruction process, making it suitable for a wide range of practical applications.
Parkinson’s disease (PD) is a progressive neurodegenerative disorder that affects motor skills, leading to difficulties in handwriting, drawing, and movement. Early detection of PD is crucial for timely intervention and effective treatment planning. Traditional clinical diagnosis relies on manual observation of drawing patterns, such as spiral and wave tests, which can be subjective and prone to human error. To address this challenge, we propose an automated deep learning framework for Parkinson’s disease detection using convolutional neural networks. In this work, spiral and wave drawings are processed using MobileNetV2, a lightweight yet powerful deep learning architecture optimized for image classification tasks. The model is fine-tuned on a dataset of healthy and Parkinson’s patients’ drawings, achieving high classification accuracy. Experimental results demonstrate that the spiral classifier (F1) and the wave classifier (F2) both provide reliable predictions, with final test accuracies exceeding 90%. The system further supports real-time predictions through a Flask-based web application, enabling users to upload spiral or wave drawings and receive instant diagnostic feedback with confidence scores. This research highlights the effectiveness of deep learning in medical diagnostics and offers a scalable, efficient, and non-invasive tool for supporting Parkinson’s Disease screening.
Brain tumor detection and classification play a critical role in early diagnosis and treatment planning. In this project, we implemented a deep learning-based Convolutional Neural Network (CNN) model to classify MRI brain images into tumor and non-tumor categories. The model was trained on a pre-processed MRI dataset using image normalization and augmentation techniques to improve generalization. Our CNN architecture, consisting of convolutional, pooling, and fully connected layers, achieved a training accuracy of approximately 97–98% and a test accuracy of 92–94%.This work was inspired by the base paper “Multimodal Ensemble Fusion Deep Learning Using Histopathological Images and Clinical Data for Glioma Subtype Classification”, which employed an advanced ensemble fusion approach combining CNNs, Transformers, and clinical data to achieve a classification accuracy of 93.6% with an AUC of 0.967. While our implementation focuses solely on MRI image classification with a single CNN model, the results demonstrate comparable accuracy levels. Unlike the base paper, our model does not incorporate multimodal data fusion or ensemble strategies, making it computationally simpler and more lightweight, while still achieving reliable performance for binary brain tumor detection. The findings indicate that deep learning models can effectively support medical imaging tasks, with scope for future enhancement through multimodal integration, ensemble learning, and advanced evaluation metrics such as precision, recall, and F1-score.
Crack Vision is a deep learning-powered application designed to detect and classify cracks in concrete surfaces with exceptional accuracy. Unlike traditional manual inspection methods, which are time-consuming and prone to human error, Crack Vision leverages state-of-the-art convolutional neural networks and transfer learning techniques—enhanced with alternative architectures such as EfficientNetB3—to deliver reliable, real-time predictions. The system has been trained on the METU concrete crack dataset, containing balanced sets of cracked and non-cracked surface images, enabling robust binary classification. Integrated into a user-friendly Flask web interface, Crack Vision allows users to easily upload images, preview them, and instantly receive classification results along with confidence scores. Additional visualization tools, including accuracy/loss charts and confusion matrices, provide transparency into model performance. This scalable solution offers a practical tool for engineers, inspectors, and infrastructure maintenance teams, enabling faster, more consistent assessments and contributing to improved structural safety and long-term durability in civil engineering projects.
Solar power is a clean, renewable energy source with minimal greenhouse gas emissions, playing a vital role in combating climate change and enhancing energy self-sufficiency. Early fault detection, such as shading, cracking, or electrical malfunctions, is crucial for maintaining maximum efficiency and preventing system failures. This work presents a deep learning model based on the ResNet50 architecture, designed for the identification of faults caused by contaminants on the surface of solar panels. ResNet50, with its residual learning framework and deep hierarchical feature extraction, enables effective handling of vanishing gradient problems while capturing fine-grained patterns in solar panel images. Its skip connections improve learning efficiency and robustness, making it suitable for detecting subtle surface abnormalities under diverse environmental conditions. The proposed model is trained on a comprehensive dataset of clean and faulty solar panels under various weather scenarios, ensuring robustness against variations in lighting, dust accumulation, and physical damage. Unlike traditional machine learning methods that require handcrafted features or additional classifiers, the ResNet50-based approach directly performs end-to-end learning, enhancing detection efficiency. This study highlights the potential of ResNet50 to provide a robust, efficient, and automatic solution for solar panel fault detection. By enabling reliable identification of surface defects, the proposed system contributes to improved maintenance strategies and long-term reliability of solar energy systems.
Chromosomes are intracellular aggregates that carry essential genetic information. Abnormalities in the number or structure of chromosomes can lead to chromosomal disorders, making chromosome screening a critical component of prenatal care. However, manual analysis of chromosomes is labor-intensive and time-consuming. With the growing demand for prenatal diagnosis, there is an increasing strain on human resources. To address this challenge, an automated approach for chromosome detection and recognition is essential. In this work, we propose a deep learning-based system using ResNet50 for the automatic detection and classification of chromosomes in metaphase cell images. ResNet50 is a deep convolutional neural network that incorporates residual learning to enable efficient training of very deep networks while capturing hierarchical and discriminative features. The proposed system processes raw metaphase cell images directly without requiring extensive preprocessing, simplifying deployment in clinical settings. A large and diverse dataset, including both simple and challenging chromosome images annotated by medical professionals, was used to train and evaluate the system. By leveraging the deep residual connections of ResNet50, the model can effectively learn complex features that distinguish between various chromosome types, even under difficult imaging conditions. The proposed method demonstrates robustness and practical applicability, offering a scalable and accurate solution for automated chromosome recognition and aiding clinical diagnosis.
Tuberculosis (TB) remains one of the most widespread and life-threatening infectious diseases worldwide, particularly in developing countries where healthcare resources are limited. Early detection and diagnosis play a crucial role in preventing the spread of TB and improving patient outcomes. In this project, we propose a deep learning–based system for the automated detection of tuberculosis using chest X-ray images. The system leverages transfer learning with EfficientNetB0 and DenseNet architectures to classify chest X-rays as either TB-positive or Healthy, achieving significant accuracy despite the challenges of limited dataset availability. Pre-processing techniques such as Contrast Limited Adaptive Histogram Equalization (CLAHE) and advanced data augmentation strategies are applied to enhance image quality and improve model generalization. The trained model is further deployed as a Flask web application, providing an easy-to-use interface with functionalities including secure login, image upload and preview, prediction results with probability scores, and visualization of performance metrics such as accuracy curves, confusion matrix, and ROC curve. The proposed framework demonstrates the potential of deep learning in developing cost-effective, reliable, and scalable diagnostic tools to support radiologists and healthcare professionals in TB screening and diagnosis.
This work presents a novel framework, Self-Segmentation Guided Diffusion for Thermal to Pseudo-Color Image Translation, that enhances the interpretability and usability of thermal imagery in real-world applications. The proposed approach leverages a diffusion-based generative model guided by self-segmentation cues to effectively translate thermal images into realistic pseudo-colored representations. To further refine object-level understanding, YOLOv10 is integrated into the pipeline for robust object detection across diverse thermal scenes, ensuring that critical features such as humans, vehicles, and infrastructure are preserved in the translated outputs. Additionally, OpenCV-based color segmentation techniques are employed to optimize region-wise color mapping, enhancing both the semantic consistency and visual quality of the generated pseudo-color images. The combined use of diffusion, YOLOv10 detection, and OpenCV segmentation provides a powerful and efficient solution for thermal image analysis, enabling improved visualization, situational awareness, and downstream tasks such as surveillance, autonomous navigation, and environmental monitoring.
Single Infrared Image Super-Resolution (SISR) aims to enhance the spatial resolution of low-quality infrared images. This task is particularly challenging due to the inherent noise and limited information content in infrared images. To address these limitations, we propose a novel approach that leverages advanced deep learning techniques to effectively restore high-resolution details. Our method effectively captures and exploits the underlying structure of infrared images. By employing advanced feature extraction and reconstruction techniques, we are able to generate significantly improved image quality. Extensive experiments on various benchmark datasets demonstrate the superior performance of our proposed method in terms of both quantitative and qualitative metrics. An edge-point classification method using the radius of the shortest distance between the whale and the current global optimum in each iteration is presented to enhance a preliminary edge. The experimental results show that the proposed edge detection method has the advantages of strong denoising, fast speed, and good quality.
Accurate classification of lung conditions from medical images is critical for timely diagnosis and effective treatment planning. In this study, we leverage the capabilities of the Xception deep learning model to classify chest computed tomography (CT) images into four clinically relevant categories: adenocarcinoma, large cell carcinoma, squamous cell carcinoma, and normal tissue. The Xception architecture, renowned for its use of depthwise separable convolutions, enhances feature extraction by separating spatial and channel-wise processing, thereby improving model efficiency while reducing computational cost. A well-balanced and preprocessed multi-class chest CT dataset is utilized to train, validate, and evaluate the model, ensuring robustness across varying pathological conditions. Data augmentation techniques are applied to mitigate overfitting and improve the model’s generalization. The training pipeline includes the use of modern optimizers, adaptive learning rates, and early stopping criteria to fine-tune performance. Experimental results demonstrate that the proposed framework achieves high classification accuracy, precision, and recall, effectively distinguishing between malignant and non-malignant cases. Furthermore, the model exhibits rapid inference speed, making it suitable for real-time or near-real-time clinical deployment. The performance is validated through confusion matrices, ROC curves, and Grad-CAM visualizations, offering interpretability and transparency for medical professionals. This research underscores the practical applicability of the Xception model in enhancing diagnostic workflows, reducing manual interpretation time, and supporting radiologists in early lung cancer detection. The integration of such AI-based systems into clinical settings holds significant promise for improving patient outcomes and streamlining diagnostic decision-making.
To address the challenges of few-shot aerial image semantic segmentation, where unseen-category objects in query aerial images must be identified with only a few annotated support images, we propose a novel approach that integrates a U-Net architecture with YOLOv10. In typical few-shot segmentation, category prototypes are extracted from support samples to guide pixel-wise segmentation of query images. However, aerial images often contain objects with arbitrary orientations and uneven spatial distributions, causing significant variations in their features. Conventional methods that do not account for orientation changes frequently fail under these conditions, resulting in low confidence scores and misclassification of same-category objects with different orientations. To overcome these limitations, our approach leverages U-Net for precise semantic segmentation while integrating YOLOv10 for fast and accurate object detection, enabling robust localization of objects even in complex aerial scenes. YOLOv10’s advanced detection capabilities allow the system to identify objects across varying scales and orientations, providing reliable bounding boxes that guide the U-Net segmentation process. By combining these two architectures, the network can accurately segment same-category objects regardless of rotation or size, reducing oscillation in confidence scores and improving detection of rotated and scattered aerial objects. This U-Net + YOLOv10 framework offers a scalable, rotation-robust solution for few-shot aerial image segmentation, effectively bridging the gap between efficient object detection and high-fidelity semantic segmentation in challenging aerial imagery.
Aerial drone imagery plays a critical role in a wide range of applications such as surveillance, environmental monitoring, and mapping. However, the presence of haze in aerial images, particularly under adverse atmospheric conditions, significantly degrades image quality by reducing contrast and color fidelity. To address this challenge, we propose a novel dehazing framework that combines Regional Saturation-Value (SV) Mapping with U-Net-driven Soft Segmentation to enhance visual clarity and preserve essential features in drone-captured scenes. The first stage of our approach involves analyzing regional variations in saturation and value components in the HSV color space to identify and enhance haze-affected regions. This adaptive enhancement process improves contrast locally, accounting for varying haze density across the image. In the second stage, a modified U-Net architecture is employed to perform soft semantic segmentation, which helps in selectively refining object boundaries and background textures without introducing artifacts or over-sharpening. Unlike traditional global dehazing methods, our framework leverages both local enhancement and deep learning-based segmentation to achieve superior restoration in heterogeneous environments. Experimental results on benchmark aerial image datasets demonstrate that our method outperforms existing dehazing techniques in terms of PSNR, SSIM, and perceptual quality. The proposed model not only improves visual interpretability but also enhances the effectiveness of downstream computer vision tasks such as object detection and scene understanding in aerial imagery.
Skin lesion segmentation plays a critical role in the early detection of skin cancer, significantly improving treatment outcomes and patient survival rates. Despite advancements in deep learning, accurate segmentation remains challenging due to variability in lesion size, shape, color, and contrast. This paper presents an enhanced segmentation framework based on the SegNet architecture, known for its encoder-decoder design that effectively retains spatial features while capturing complex lesion characteristics. To address real-world imaging challenges, the proposed method integrates comprehensive preprocessing techniques using OpenCV, including image resizing, noise reduction, contrast enhancement, and data augmentation. Furthermore, the framework fuses multiple input modalities—standard RGB images, grayscale representations robust to illumination changes, and shading-reduced images—to mitigate lighting inconsistencies and improve model robustness. Experimental results demonstrate that the proposed SegNet-based approach achieves accurate and consistent segmentation performance, effectively delineating lesion boundaries even in cases with irregular morphology and low contrast. This study underscores the potential of combining deep learning with advanced preprocessing for reliable and precise skin lesion segmentation.
Malaria, a life-threatening disease transmitted by mosquitoes, remains a major public health challenge, claiming thousands of lives each year. Limited access to reliable detection tools, combined with challenges such as insufficient laboratory resources and inexperienced personnel, contribute to its high mortality rate. Recently, advancements in image analysis of malaria-infected red blood cells (RBCs) have provided promising alternatives for more accessible detection methods. By leveraging digital microscopy and innovative machine learning approaches, researchers aim to develop practical solutions that can improve diagnostic accuracy and accessibility. This approach not only enables a faster response in clinical settings but also highlights the potential for integration with IoT-enabled devices, facilitating wider deployment in resource-constrained regions. Such advancements underscore the potential of image-based malaria detection methods to enhance early diagnosis and treatment, especially in areas with limited medical resources.
Human action recognition is a vital component of smart monitoring systems, with applications in healthcare, surveillance, and intelligent indoor environments. This project presents a smart human action monitoring system that relies exclusively on RGB images to detect and classify human activities in real time. The system processes image frames to extract essential features such as body posture, joint positions, and movement patterns. These features are then analyzed to recognize common human actions, including sitting, walking, running, drinking, and other daily activities. By leveraging advanced image processing and deep learning techniques, the system achieves high accuracy, robustness to varying lighting conditions, and efficiency suitable for real-time deployment. Experimental results demonstrate the system’s ability to monitor human activity reliably, providing a practical solution for indoor action recognition and smart environment applications.
In this approach, we replace the traditional Otsu method for Field of View (FOV) segmentation in retinal images with a deep learning-based model utilizing U-Net architectures. The preprocessing phase begins by converting the retinal image to grayscale, specifically using the red channel, which offers better contrast for the fine vascular structures. A logarithmic transformation is applied to the grayscale image to further enhance the visibility of small features such as micro aneurysms and capillaries. This step prepares the image for more accurate segmentation by emphasizing details essential for the detection of diabetic retinopathy and other retinal abnormalities. The core of the segmentation process relies on U-Net, a Convolutional neural network designed for medical image segmentation. U-Net consists of a contracting path that captures high-level contextual features through successive Convolutional layers and down sampling operations. This is followed by an expanding path that progressively up samples the feature maps and concatenates them with corresponding layers from the contracting path, enabling precise localization of the FOV region. The final step in the U-Net architecture involves a 1x1 convolution layer that produces the binary mask of the FOV region, followed by a sigmoid activation function to output the probability map of the segmented area.
Wheat leaf diseases, including rust, powdery mildew, and leaf blight, significantly impact crop yield and quality, making early detection essential for effective disease management. This project presents an advanced wheat disease detection framework using YOLOv10, a cutting-edge object detection model that combines high accuracy, real-time processing, and efficient feature extraction. YOLOv10’s enhanced architecture and anchor-free detection capabilities allow it to accurately identify multiple disease types from wheat leaf images, even under varying lighting conditions, leaf orientations, and field environments. By training on a diverse and augmented dataset, the model achieves strong generalization and robustness, enabling rapid and reliable disease detection through mobile devices, drones, or field cameras. The system not only provides precise classification and localization of diseased regions but also supports timely intervention and precision agriculture, helping farmers make data-driven decisions to protect crops. With its scalability, speed, and adaptability, YOLOv10 represents a powerful tool for automated plant disease management and sustainable farming practices.
Radiation-induced cavities and voids within structural materials present a significant challenge for ensuring the long-term safety and efficiency of nuclear reactors. Accurate detection and quantification of these cavities are critical for understanding irradiation damage mechanisms and predicting material degradation. Traditional object detection models such as YOLOv8 and Faster R-CNN have shown reasonable performance in cavity detection; however, their bounding-box-based outputs often fail to precisely localize small or irregularly shaped cavities, particularly under degraded imaging conditions. To address these limitations, this paper introduces CavityNet, a novel two-stage deep learning framework that combines Enhanced Super-Resolution GAN (ESRGAN) and a Swin Transformer-based UNet (Swin-UNet) architecture for improved cavity detection in irradiated microstructures. The proposed system first applies ESRGAN to enhance underfocused or overfocused scanning electron microscopy (SEM) images by reconstructing fine textures and high-frequency cavity boundaries. These super-resolved images are then passed through a Swin-UNet model, which integrates hierarchical self-attention mechanisms with a UNet-style encoder-decoder pipeline to achieve pixel-level segmentation of cavities. This approach enables both global contextual understanding and local feature refinement, improving detection accuracy in complex microstructural environments. The model is trained and evaluated using publicly available datasets from the Canadian Nuclear Laboratory (CNL) and Nuclear Oriented Materials & Examination (NOME), with significant gains observed in precision, recall, and F1-score over existing methods. The results demonstrate that the combination of super-resolution and transformer-based segmentation is highly effective in detecting small-scale cavities, even in low-quality images. The proposed CavityNet framework represents a step forward in automated microstructural analysis, offering enhanced performance, robustness to imaging conditions, and potential applications in real-time material degradation monitoring in nuclear environments.
As digital healthcare systems increasingly rely on the transmission and storage of medical images, ensuring their copyright protection and integrity has become a critical concern. Traditional moment-based zero-watermarking schemes, while effective, often struggle to withstand complex image processing and geometric attacks. To address these limitations, this study proposes a novel zero-watermarking framework that leverages the deep feature extraction capabilities of Convolutional Neural Networks (CNNs) to generate highly robust and discriminative watermark signatures without altering the original image content. The system utilizes a pretrained CNN model to extract high-level feature maps that capture the intrinsic spatial patterns and semantic structures of medical images. These features are then encoded into a secure zero-watermark, which can be used later for copyright verification and tamper detection. Unlike handcrafted moment-based descriptors, CNN features offer better resilience to noise, compression, rotation, scaling, and other common attacks. The watermark extraction process relies solely on the feature similarity between the input image and the reference watermark, ensuring lossless protection of the original image. Experimental results demonstrate that the proposed CNN-based method significantly outperforms traditional zero-watermarking approaches, including those based on orthogonal moments, in terms of robustness, discrimination, and authentication accuracy. This makes the system highly suitable for securing sensitive medical images in telemedicine, PACS systems, and other digital healthcare infrastructures.
Access to the internet is crucial for communication, information, and essential services in modern life. Internet blackouts during politically sensitive periods can severely affect mental health, increasing stress, anxiety, and emotional distress. This study examines the psychological impact of the internet shutdown during the Bangladesh Quota Movement in July 2024. A survey of 2,085 participants captured behavioral, emotional, and psychological responses in academic, work, and social contexts. Textual responses were processed using NLP techniques to extract relevant features reflecting stress levels. An MLP (Multi-Layer Perceptron) classifier was applied to predict stress intensity across participants. The model demonstrated high performance in classifying stress, achieving significant accuracy and reliability. Findings highlight the widespread mental distress caused by internet disruptions. The study emphasizes the importance of timely mental health interventions during crises. These insights align with SDG 3, promoting mental well-being and social resilience in low- and middle-income countries.
The phenomenon of cyberbullying has emerged as a critical challenge in the digital landscape, posing detrimental effects on individuals and broader societal well-being. A practical solution to this widespread issue involves the accurate identification of cyberbullying within social media platforms, which constitute a significant share of digital communication. While traditional approaches have primarily utilized machine learning algorithms and pre-trained language models, these often face challenges such as high computational complexity and limited adaptability to nuanced linguistic patterns. This paper proposes an advanced framework that leverages Natural Language Processing (NLP) techniques combined with Long Short-Term Memory (LSTM) networks to improve cyberbullying detection in online text. The framework applies refined text preprocessing steps—such as tokenization, stopword removal, stemming, and lemmatization—to ensure high-quality and noise-free input data. Sentiment features and contextual patterns are extracted using embedding methods to preserve semantic information. These processed inputs are then fed into an LSTM model, which effectively captures the sequential and temporal dependencies in textual data, making it well-suited for understanding the dynamic nature of cyberbullying language. Additionally, to address class imbalance in the multi-class setting, resampling techniques are employed, improving the model's robustness without inducing bias. The proposed system demonstrates that combining deep learning with comprehensive NLP enhances the accuracy and contextual understanding required for effective cyberbullying detection.
In the rapidly evolving digital era, personalized recommendations play a crucial role in enhancing the educational experience of students. With the increasing use of mobile devices, it has become easier to collect app usage data, which can be leveraged to provide tailored educational app suggestions. This study focuses on recommending suitable applications for university students, specifically targeting Undergraduate (UG), Postgraduate (PG), and Graduate levels, based on their app usage patterns. The dataset is preprocessed using Natural Language Processing (NLP) techniques, including text cleaning, tokenization, and feature extraction, to capture relevant attributes from app descriptions and student interaction data. A Random Forest Classifier is employed as the core model due to its robustness, ability to handle high-dimensional data, and strong performance in classification tasks. The proposed system accurately categorizes students’ preferences and recommends apps aligned with their academic needs. Experimental results highlight the efficiency of Random Forest in producing reliable, scalable, and interpretable recommendations compared to traditional methods. This approach ensures better personalization, reduces data sparsity issues, and enhances overall user satisfaction in educational recommendation systems.
In this study, I propose a method for forecasting the next-day Bitcoin price range by integrating natural language processing (NLP) techniques with a Long Short-Term Memory (LSTM) network. The model leverages high-dimensional technical indicators combined with sentiment features extracted from Twitter data through advanced NLP methods to capture the nuanced market sentiment. By incorporating sequential analysis of both numerical market indicators and textual sentiment data, the LSTM model effectively learns temporal dependencies and complex patterns, enhancing the prediction capability for Bitcoin price movements. The experiments utilize Bitcoin market data spanning six years alongside millions of relevant Twitter posts. The approach demonstrates the value of combining deep learning architectures with sentiment analysis to improve forecasting robustness and interpretability in volatile cryptocurrency markets. Sensitivity analysis is applied to optimize the influence of sentiment features, highlighting the importance of sentiment-driven insights in financial prediction models and offering a novel perspective for more accurate and dynamic cryptocurrency market forecasting.
In the era of digital technology, the vast amount of user-generated content on online platforms poses significant challenges for analyzing large volumes of text to understand user emotions and predict product ratings. The increasing prevalence of online reviews necessitates advanced natural language processing (NLP) techniques to effectively extract meaningful insights from textual data. This paper introduces a novel approach leveraging Long Short-Term Memory (LSTM) networks combined with sophisticated NLP methods to capture the nuanced emotional expressions within review texts. Unlike traditional binary sentiment classification, the proposed model provides a continuous and fine-grained sentiment scoring that reflects the intensity and subtlety of user opinions, thereby enhancing the sentiment analysis process. The use of LSTM enables the model to effectively capture sequential dependencies and contextual relationships in text data, improving the understanding of complex linguistic patterns within reviews. This comprehensive sentiment representation is integrated with predictive modelling techniques to enhance the accuracy of rating predictions in recommendation systems. The proposed framework demonstrates the capability to harness rich textual information and dynamic sentiment variations, making it highly applicable across multiple domains such as entertainment, e-commerce, and social media. This approach not only improves prediction outcomes but also supports more personalized and meaningful user experiences.
In the digital era, the rapid growth of online reviews has significantly influenced consumer behavior and public opinion. These text-based sentiments play a crucial role in shaping decisions and perceptions. However, analyzing sentiments and emotions in written reviews presents unique challenges due to the complexity of human language, contextual variations, and the presence of sarcasm or ambiguity, particularly in low-resource languages like English. This study introduces a comprehensive framework for text-based sentiment analysis and emotion detection tailored to English reviews. The framework focuses solely on linguistic features—such as lexical patterns, syntactic structures, and semantic relationships—to capture the underlying emotions and opinions expressed in text. By leveraging advanced natural language processing and deep learning techniques, the system enhances the accuracy and depth of sentiment interpretation. A dedicated dataset of English text reviews has been developed to support this research, providing a valuable resource for future studies. The proposed approach is validated through a detailed case study, demonstrating its effectiveness and practical applicability in real-world scenarios.
Suicide is a critical mental health concern and a leading cause of death globally. Emotional dysregulation has been widely recognized as a key factor contributing to suicidal behavior. In the digital age, individuals increasingly express suicidal ideation on social media platforms, often seeking help, empathy, or validation. This study introduces a natural language processing (NLP)-driven framework to classify suicide notes based on emotional content, utilizing deep learning techniques for enhanced detection. By incorporating advanced NLP preprocessing techniques—such as tokenization, lemmatization, stop word removal, and word embeddings (e.g., Word2Vec or GloVe)—the model is able to effectively extract semantic and emotional features from unstructured text. To analyze these emotional patterns, we employ a Long Short-Term Memory (LSTM) neural network, capable of capturing temporal dependencies and sequential sentiment shifts within the text. The LSTM model is trained to recognize latent emotional states that correlate with suicidal ideation, enabling binary classification of notes as either suicidal or non-suicidal. Our approach highlights the linguistic subtleties in suicide-related content, particularly the differences in vocabulary between social media posts and authentic suicide notes. The findings suggest that NLP-enhanced LSTM models can significantly improve the ability to detect indirect or emotionally nuanced indicators of suicidal intent, offering a promising tool for early intervention and mental health monitoring.
The rapid growth of textual data across digital platforms has heightened the need for intelligent systems capable of understanding semantic similarity between sentences. This study presents an Intelligent Paraphrase Recognition System that leverages advanced Natural Language Processing (NLP) techniques to accurately identify whether two sentences convey the same meaning despite differences in structure or vocabulary. The proposed model integrates transformer-based architectures such as BERT and RoBERTa with semantic similarity measures and contextual embeddings to capture deep linguistic and contextual relationships between text pairs. Unlike traditional lexical-based approaches, this system emphasizes contextual understanding, enabling it to recognize paraphrases even in the presence of idiomatic expressions, rephrasing, or syntactic variations. The model undergoes fine-tuning on large-scale benchmark datasets such as Quora Question Pairs and Microsoft Research Paraphrase Corpus (MRPC) to ensure high generalization and reliability. Experimental results demonstrate that the proposed approach achieves superior accuracy, precision, and recall compared to conventional methods, establishing it as a robust and scalable solution for applications in plagiarism detection, question answering, text summarization, and semantic search.
Text summarization plays a vital role in condensing large volumes of information into concise and coherent summaries. Traditional extractive methods often fail to capture semantic richness, while abstractive models face challenges in grammatical correctness and factual accuracy. The base paper introduced a syntax-augmented, headline-aware neural model that leverages syntactic features and headline guidance to improve summary generation. In this work, we extend this idea by evaluating both classical and modern approaches on the CNN/DailyMail dataset. First, we establish a baseline using the unsupervised Text Rank algorithm, which provides extractive summaries but achieves limited ROUGE scores. We then explore a state-of-the-art pretrained Transformer, DistilBART, which has been fine-tuned on CNN/DailyMail. Without additional training, DistilBART generates abstractive summaries with significantly higher accuracy, outperforming the base model on ROUGE-1 and ROUGE-2 metrics. Our experimental results show that incorporating headline cues and syntactic awareness (as in the base paper) improves traditional LSTM-based models, but modern pretrained Transformers demonstrate superior performance and efficiency. The findings highlight the evolution of summarization methods from handcrafted features to large-scale pretrained architectures, offering both practical insights for deployment and a strong foundation for future research.
Agricultural decision-making requires timely and accurate knowledge support, especially in areas such as pest management, nutrient recommendation, and disease prevention. In many regions, farmers face difficulty accessing expert guidance, leading to lower productivity and crop losses. To address this gap, this research presents a lightweight and scalable Agricultural Question-Answering system that leverages advanced semantic representation through DeBERTa-v3 sentence embeddings. Rather than generating free-form text responses, the proposed approach performs intelligent answer retrieval from a verified agricultural knowledge base using efficient FAISS similarity indexing. This design reduces computational overhead, eliminates hallucinated outputs, and ensures consistent factual accuracy. The system is optimized to run on CPU-only environments, making it affordable and deployable in real-world resource-limited settings. Experimental results show significant improvements in retrieval precision, inference speed, and reliability compared to conventional transformer-based ensemble models. The proposed framework offers a practical AI-driven advisory solution for modern agriculture, enhancing accessibility and supporting sustainable farming practices.
The rapid growth of digital news content has made automatic text classification an essential task in Natural Language Processing (NLP). This project focuses on news topic classification, where articles are categorized into four distinct topics: World, Sports, Business, and Sci/Tech. The AG News dataset is used, consisting of labeled news headlines and descriptions. A pre-trained Distil BERT transformer model was fine-tuned on this dataset to achieve high classification performance. The model was trained using 120,000 news samples and evaluated on separate test and validation sets. The proposed system achieved an accuracy of 95%, outperforming baseline machine learning models and ensuring reliable classification across all categories. To demonstrate practical applicability, the trained model was deployed using a Flask web application. The application provides a user-friendly interface with essential features including user registration, login, news prediction, and visualization of training results. Users can enter a news article’s title and description, and the system automatically predicts the most relevant topic. This work highlights the effectiveness of deep learning transformer models in real-world NLP tasks and provides a scalable solution for automated news categorization, which can be extended to other domains such as content filtering, recommendation systems, and information retrieval.
This study presents an advanced approach to detecting lung auscultation sounds using Mel-frequency Cepstral Coefficients (MFCC), Chroma features, and neural networks. Lung auscultation, a key diagnostic tool in identifying respiratory conditions, often relies on the expertise of medical professionals to interpret subtle sound patterns. However, automated systems that accurately classify these sounds can greatly assist in early diagnosis and treatment. To achieve this, we employed MFCC, which captures the power spectrum of sounds and effectively models the way humans perceive auditory signals, focusing on the critical frequency ranges for lung sounds. Additionally, Chroma features, which represent the tonal content of audio signals, were used to capture harmonic aspects that could be indicative of specific lung conditions. These features were then fed into a neural network designed to classify lung sounds into various diagnostic categories, such as normal breathing, wheezing, crackles, and other abnormal respiratory sounds. The neural network, trained on a comprehensive dataset of lung sounds, was able to learn complex patterns and correlations within the MFCC and Chroma features, leading to high accuracy in sound classification. This automated approach offers a powerful tool for enhancing the precision of lung sound diagnosis, potentially leading to earlier detection of respiratory conditions and improved patient outcomes.
In the contemporary digital landscape, the widespread circulation of fake news through social media channels poses a profound threat to public trust, societal harmony, and the integrity of democratic institutions. Conventional fake news detection techniques often exhibit limitations in discerning the nuanced semantics embedded in brief and informal textual formats such as tweets. To overcome these challenges, we introduce an innovative methodology that leverages RoBERTa, an advanced transformer-based language model, to enhance semantic similarity assessment between tweets within a context-aware fake news detection framework. In contrast to prior approaches employing static word representations like FastText, RoBERTa captures rich contextual dependencies and dynamic token relationships across varied linguistic environments. The proposed framework is structured as a three-layered architecture—encompassing topic, social, and contextual components—with tweet similarity playing a central role in the topic layer. By utilizing RoBERTa-derived sentence embeddings in conjunction with cosine similarity measures, our method demonstrates superior performance in identifying semantic relevance. Comprehensive evaluation on the STSBenchmark and a curated dataset of social media content reveals that our approach surpasses state-of-the-art similarity models, achieving higher correlation scores and enhanced detection accuracy. This study underscores the efficacy of RoBERTa in enriching semantic interpretation for misinformation detection and paves the way for future enhancements involving multilingual and multimodal data integration.
The stock market is highly volatile and influenced by numerous factors operating simultaneously, making accurate prediction a challenging task. Over the years, several studies have attempted to forecast stock prices using statistical, machine learning, and deep learning approaches. However, these models often fail to capture the combined effect of external factors such as public sentiment, which has a significant impact on market movement. In this work, we propose a hybrid algorithm that integrates Twitter sentiment analysis with a Long Short-Term Memory (LSTM) network to predict the next day’s closing stock price. Sentiment analysis is performed using part-of-speech (POS) tagging to evaluate public opinion extracted from Twitter data, while LSTM is employed to model the temporal dependencies in historical stock prices. By combining sentiment-driven signals with sequential deep learning, our model provides a more reliable prediction of stock price movements compared to conventional approaches. Experimental results demonstrate that incorporating sentiment into LSTM improves accuracy and offers a clearer insight into the future direction of stock prices.
To address the challenge of sentiment classification in multi-source textual data, this study implements a sentiment analysis framework based on Natural Language Processing (NLP) techniques and a Long Short-Term Memory (LSTM) neural network. The system first performs text preprocessing, including data cleaning, tokenization, stop word removal, and word embedding generation, to prepare diverse datasets for analysis. The LSTM model is trained to capture long-range dependencies and contextual information in text, enabling accurate sentiment prediction. Experimental evaluation shows that the proposed model achieves high accuracy in recognizing sentiment categories, with an overall classification accuracy of 0.928 for five emotion classes and a minimum mean absolute error (MAE) of 0.128. The model demonstrates strong performance in terms of precision, recall, F1-score, and receiver operating characteristic (ROC) metrics, confirming its reliability for sentiment orientation recognition in heterogeneous data sources. This research highlights the effectiveness of combining NLP preprocessing with LSTM networks for robust multi-source sentiment analysis, contributing to the development of advanced text mining and opinion detection systems.
The widespread use of social media platforms provides valuable insights for real-time mental health monitoring, particularly for identifying signs of depression through user-generated content. This project introduces a deep learning-based approach for depression detection that leverages the sequential modeling capabilities of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. To address the challenges posed by informal language, the preprocessing pipeline includes emoji normalization, slang replacement using a custom dictionary, and the extraction of emotion scores to enhance semantic understanding. The hybrid LSTM-GRU model is trained on annotated social media posts, effectively capturing emotional and contextual patterns within the text. Experimental results show strong performance across evaluation metrics such as accuracy, precision, recall, and F1-score. This framework demonstrates the potential of combining LSTM and GRU for lightweight yet effective depression detection in noisy and emotionally complex social media environments.
Click fraud remains a critical threat in online advertising, leading to inflated costs and undermining campaign effectiveness by diverting budgets toward illegitimate activity. Existing solutions leveraging machine learning and deep learning models have shown promise, but many still struggle with identifying subtle behavioral patterns in fraudulent clicks. In this work, we propose a robust LSTM-based Recurrent Neural Network (RNN) framework designed to enhance the detection of fraudulent click activity by modeling sequential patterns and time-dependent features in user interaction data. A comprehensive preprocessing pipeline was developed, including timestamp decomposition, feature scaling, and label encoding to ensure optimal input representation. Our model was trained and evaluated against a carefully engineered dataset enriched with behavioral and contextual click features. Among various deep learning architectures examined, including Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN), the RNN-LSTM model demonstrated superior performance, achieving 99% accuracy with high precision and recall scores. The results validate the effectiveness of temporal modeling in identifying fraudulent click patterns and highlight the LSTM model’s suitability for deployment in real-time fraud detection systems. This study not only advances existing anti-fraud mechanisms but also sets a strong foundation for future work in intelligent online ad verification and fraud prevention.
The detection of sarcasm and humor in natural language, particularly in low-resource languages like Bangla, poses a significant challenge due to the complex interplay of linguistic, contextual, and cultural cues. Traditional text-based models often fail to capture the subtle nuances of humor and irony, which are influenced by tone, visual expressions, and contextual semantics. To address this limitation, Contextual Multi-Modal Deep Learning integrates textual, visual, and contextual features to achieve a more comprehensive understanding of sarcastic and humorous content. This approach leverages deep learning architectures such as transformers, CNNs, and attention mechanisms to process multimodal inputs—combining textual embeddings from pre-trained language models like Bangla BERT with visual features extracted from images or memes. The fusion of these modalities enables the model to capture both linguistic patterns and emotional undertones, improving the accuracy of sarcasm and humor recognition. Additionally, incorporating contextual metadata such as user sentiment, conversation history, or social media context enhances interpretability and real-world applicability. Overall, this advanced framework significantly enhances performance in Bangla sarcasm and humor detection, contributing to sentiment analysis, social media monitoring, and cultural AI research.
With the continuous evolution of advanced large language models like GPT, the proliferation of AI-generated fake news presents growing challenges to information dissemination. Traditional text classification methods struggle to detect such content due to their limited capacity to distinguish between authentic and fabricated news. To address this issue, this study introduces an MLP (Multi-Layer Perceptron) Classifier integrated with Natural Language Processing (NLP) techniques for detecting AI-generated fake news. Textual data is preprocessed through tokenization, stop-word removal, and vectorization to extract meaningful features, which are then used as inputs to the MLP network. The classifier leverages multiple hidden layers and nonlinear activation functions to capture complex linguistic patterns that characterize fabricated news. A new dataset, generated using GPT-4 and covering 42 news categories, was developed to train and evaluate the system. Experimental results demonstrate that the proposed MLP model achieves reliable accuracy and strong F1 scores, surpassing traditional machine learning approaches. These findings highlight the potential of MLP-based architectures in enhancing fake news detection and safeguarding online information integrity.
Mental health disorders represent a growing global concern, with millions of individuals expressing their psychological conditions through digital platforms such as social media. Detecting mental illness from textual data is a complex task due to the emotional depth, informal language, metaphorical expressions, and context-specific cues often embedded within such posts. Traditional machine learning methods and generic language models frequently struggle to capture these subtleties, leading to limited prediction accuracy. To overcome these challenges, this study presents a robust and intelligent framework for multiclass mental illness prediction using advanced deep learning and natural language processing (NLP) techniques. The proposed system integrates domain-adapted transformer models with deep neural networks to improve understanding and classification of mental health-related expressions. Specifically, MentalBERT, a variant of BERT pretrained on mental health corpora, is employed to extract rich contextual features from psychologically sensitive text. Alongside this, MelBERT, a metaphor-aware model, is used to interpret figurative and symbolic language—a common characteristic in users with emotional distress. These transformer-based models are complemented by Convolutional Neural Networks (CNNs) for hierarchical feature extraction and a Bidirectional Long Short-Term Memory (BiLSTM) network to capture long-range semantic dependencies from both past and future contexts of the sequence. This hybrid architecture enables the system to predict multiple classes of mental illness—including depression, anxiety, PTSD, bipolar disorder, and more—with improved accuracy and reliability. The model was trained and evaluated on a labeled dataset of social media posts, with performance metrics indicating superior results over traditional methods. The findings of this research highlight the importance of domain-specific modeling, figurative language interpretation, and sequential deep learning techniques in building an effective and scalable mental health detection system. This work contributes to the broader goal of mental health awareness by providing a tool that can assist professionals in early detection and intervention strategies through digital linguistic analysis.
This project presents a multilingual Visual Question Answering (VQA) web application built with Flask, integrating deep learning and NLP techniques. It utilizes the Salesforce BLIP-VQA model, which combines a Vision Transformer and text decoder to answer questions based on uploaded images or videos (up to 60 seconds, given sufficient GPU and memory). Users can interact in various Indian languages like Hindi, Tamil, and Telugu, with a mock translation service converting questions to English for processing and translating answers back to the original language. The system supports secure file uploads, validates inputs, and extracts keyframes from videos to generate context-aware answers. This real-time, multilingual application demonstrates the potential of transformer-based VQA models in accessible and scalable web environments, with room for future integration of services like IndicTrans2. A key highlight of the system is its multilingual support, allowing users to ask questions in a variety of Indian languages such as Hindi, Tamil, Telugu, and others. To achieve this, a mock translation service is integrated to simulate the translation of the user's input question to English (the primary language used by the VQA model) and then translate the model-generated English answer back to the user's original language. This simulates an end-to-end multilingual pipeline and demonstrates the feasibility of extending the system with real translation services like IndicTrans2
With the exponential rise of video content on social media platforms, particularly YouTube, which handles over 500 hours of uploads every minute, efficient video indexing, retrieval, and summarization have become critical challenges. Traditional methods rely heavily on user-provided metadata such as titles, tags, and descriptions, which are often inaccurate or unrelated to the actual content. To overcome these limitations, recent advances in vision-language models, such as BLIP (Bootstrapping Language-Image Pretraining) transformers, have enabled more accurate and automated video understanding by jointly learning from visual and textual modalities. This paper presents a systematic review of deep learning-based video summarization approaches, with a particular emphasis on BLIP-based models and their potential to bridge the gap between raw video content and semantic interpretation. Out of more than 300 research studies, 44 were shortlisted using strict inclusion criteria, and their methodologies, applications, and datasets are critically analyzed. The review highlights how BLIP transformers enhance summarization performance by generating context-aware captions, enabling semantic indexing, and improving retrieval efficiency. The insights provided in this study offer valuable guidance for researchers and practitioners aiming to leverage deep learning and vision-language models for managing large-scale video data in social networking platforms.
Early and accurate detection of rice leaf diseases is essential for maintaining crop health, preventing yield loss, and supporting sustainable agriculture. Traditional identification methods rely heavily on expert inspection, which is time-consuming, subjective, and unsuitable for large-scale implementation. This study proposes a scalable image-based framework for detecting and classifying rice leaf diseases using advanced deep learning techniques. A DenseNet121 transfer learning model was employed, combined with extensive data augmentation and progressive fine-tuning, enabling the system to extract robust features directly from rice leaf images without requiring manual segmentation or handcrafted features. The framework was trained and validated on a dataset comprising six major rice leaf disease categories and achieved a 97.35% test accuracy, significantly outperforming conventional approaches. The results demonstrate that the proposed system is highly effective for automated rice disease detection and offers a reliable foundation for future integration into decision-support tools for precision agriculture.
Handwritten English text often varies greatly in style, shape, and clarity, making it difficult for machines to accurately recognize and convert such text into readable digital form. This project focuses on building an intelligent system capable of recognizing unclear or complex handwritten words and converting them into clear, machine-readable English text. The proposed approach employs a Convolutional Recurrent Neural Network (CRNN) architecture that integrates a pretrained ResNet18 for spatial feature extraction and a Bidirectional Long Short-Term Memory (BiLSTM) network for sequence learning. The use of Connectionist Temporal Classification (CTC) loss enables end-to-end training without the need for explicit character segmentation, allowing the model to handle varying word lengths and handwriting styles. The system effectively learns to identify patterns and structures in handwritten text, achieving high accuracy in both character-level and word-level recognition. Experimental evaluation demonstrates that the proposed model provides robust and efficient performance, outperforming traditional CNN-based classifiers. This work contributes toward the development of a reliable deep learning framework for general-purpose handwritten English word recognition.
Kidney diseases such as cysts, tumors, and stones are life-threatening conditions that require early and accurate detection for effective treatment. Traditional diagnostic methods using CT scans often depend on manual interpretation by radiologists, which can be time-consuming and prone to error. To address this challenge, we propose a deep learning–based automated system for multi-class kidney disease classification using CT images. The system utilizes EfficientNetV2B0, a state-of-the-art convolutional neural network, to extract deep features from CT scans. A custom classification head with Global Average Pooling, Dropout, and Dense layers is employed to classify images into four categories: Normal, Cyst, Tumor, and Stone. Data augmentation and class weighting are applied to handle dataset imbalance and improve generalization. The model achieves high accuracy and robustness, outperforming conventional CNN approaches. Furthermore, the trained model is integrated into a Flask web application, providing a user-friendly interface with functionality for image upload, real-time prediction with confidence scores, and visualization of training results through charts. This approach demonstrates the potential of advanced deep learning models combined with web deployment to support radiologists in fast, reliable, and scalable kidney disease diagnosis.
Artificial intelligence has significantly transformed the landscape of fraud detection in the banking sector. This study investigates the integration of advanced AI techniques to strengthen credit card fraud prevention systems. By leveraging the structural relationships within transaction data and applying data compression and anomaly detection strategies, the approach enhances the identification of suspicious activities. Case studies from two banking institutions are used to validate the methodology, demonstrating its ability to detect fraudulent behavior effectively. The system emphasizes adaptability to evolving fraud patterns and the capacity to analyze complex financial data efficiently. Overall, the research highlights the importance of intelligent data-driven models in improving the reliability and responsiveness of banking fraud detection frameworks.
Detecting anomalies in industrial machine sounds is essential for maintaining operational efficiency, preventing costly equipment failures, and ensuring workplace safety. However, this task is challenging due to the complex and variable nature of industrial environments, including background noise and changing operating conditions. This study presents a comprehensive approach that utilizes advanced feature extraction methods to capture the important acoustic characteristics of machinery sounds. Various machine learning and deep learning techniques are applied to explore effective strategies for anomaly detection. The research employs multiple datasets to evaluate the proposed methods under different experimental conditions. Results from extensive testing demonstrate the effectiveness of the approach in real-world industrial settings, highlighting its potential for enhancing predictive maintenance through improved sound analysis. This work contributes to advancing the capabilities of industrial monitoring systems by providing a reliable means to detect abnormalities in machine operations.
Melanoma remains one of the deadliest forms of skin cancer, responsible for the majority of skin cancer-related deaths worldwide. Early and accurate detection is crucial for improving patient survival, yet visual diagnosis through dermoscopy is often difficult and prone to variability, even among expert dermatologists. In this study, we propose an advanced deep learning–based framework for melanoma classification using dermoscopic images, aimed at improving diagnostic accuracy and confidence compared to traditional convolutional neural networks (CNNs) explored in prior work. Unlike the base study that employed semi-supervised multi-teacher ensemble learning, our approach leverages a supervised training pipeline with a state-of-the-art ConvNeXt architecture, focal loss to address class imbalance, and progressive fine-tuning to enhance feature extraction. The model was trained and validated on a stratified dataset split into training, validation, and test sets, achieving 92% classification accuracy and an AUC of 0.974 on the independent test set—outperforming baseline CNN ensembles. Comprehensive evaluation using confusion matrices, ROC analysis, and Grad-CAM visualizations further demonstrated the model’s robustness and interpretability. These results highlight the potential of ConvNeXt-based architectures for reliable and explainable melanoma diagnosis, offering clinicians an effective decision-support tool for early detection and management of skin cancer
Colorectal cancer (CRC) remains one of the leading causes of cancer-related mortality worldwide, emphasizing the need for early and accurate diagnosis. Histopathological examination of colorectal tissue is the clinical gold standard but involves time-consuming manual processes that are susceptible to human error. To address this, the study introduces an advanced dual-track deep learning architecture aimed at automating the classification of CRC histopathology images. The model incorporates a mechanism for capturing both global and local features from tissue samples and enhances focus on diagnostically significant regions through attention refinement. By integrating multiple layers of feature extraction and attention strategies, the proposed system improves the quality and precision of the analysis. This approach demonstrates potential for aiding pathologists, reducing diagnostic workload, and increasing the reliability of colorectal cancer detection.
Clinically, the proportion and classification of white blood cells (WBCs) are traditionally performed through manual microscopic analysis, which is subjective and time-consuming. To address this, automated methods using machine learning (ML) and deep learning (DL) have been explored. In this study, we propose an advanced WBC classification system based on the MobileNetV2 architecture, which offers a lightweight yet highly efficient solution for medical image analysis. Open datasets such as Raabin-WBC and private clinical data were used to validate the model. WBC images were preprocessed using U-Net for effective segmentation of the nucleus, cytoplasm, and whole cell regions. The MobileNetV2 model was then applied for feature extraction and classification, leveraging its inverted residual structure and depthwise separable convolutions to enhance accuracy while reducing computational cost. Experimental results demonstrated that the proposed model achieved a classification accuracy of outperforming traditional ML-based approaches and providing results comparable to state-of-the-art deep learning architectures. The system was further integrated into a user-friendly graphical interface for real-time clinical use, completing the segmentation and classification tasks . The application of MobileNetV2 in this context significantly improves diagnostic efficiency, scalability, and consistency in peripheral blood smear (PBS) testing, making it an effective tool for assisting hematologists in routine diagnostics.
Bean rust and angular leaf spot are major threats to bean cultivation, significantly affecting crop health and reducing yields. Timely and accurate disease detection is essential for maintaining agricultural productivity, yet conventional diagnostic methods often rely on expert intervention and are not scalable for large farming operations.This research introduces an explainable deep learning-based framework tailored for bean leaf disease classification. The model is designed to effectively recognize visual disease patterns, both broad and subtle, across diverse leaf samples. To ensure transparency in decision-making, the framework integrates visual interpretability features that highlight the specific regions of the leaf image influencing the classification outcomes. The system was trained and validated using a curated dataset of bean leaf images covering multiple disease conditions and healthy samples. It demonstrated robust performance in recognizing and distinguishing between disease types, emphasizing its potential as a valuable tool for real-time, automated crop monitoring. The model's lightweight design and interpretability make it suitable for deployment in field environments, contributing to smarter, data-driven agricultural practices.
In contemporary society, many health challenges are being effectively addressed through the application of computer science and artificial intelligence. Cardiac arrhythmia, a critical cardiovascular disorder, poses serious health risks and requires timely detection to prevent fatalities. This study explores the use of advanced AI techniques to improve early diagnosis of high-risk cardiac arrhythmia cases. We propose a novel methodology based on the NasNetMobile architecture, a lightweight and efficient convolutional neural network designed to extract meaningful features from ECG signals for accurate classification. The proposed NasNetMobile-based framework integrates deep learning capabilities to enhance the detection of cardiac arrhythmias by leveraging its optimized network structure for mobile and resource-constrained environments. This approach aims to provide a more accessible and effective solution for continuous cardiac monitoring and early warning, potentially supporting healthcare providers in delivering timely interventions and improving patient outcomes.
Heart disease remains one of the leading causes of death globally. Echocardiography is a widely used technique for diagnosing cardiovascular conditions, yet accurately interpreting echocardiogram images requires specialized medical expertise. To address this challenge, this study introduces a deep learning-based approach utilizing the EfficientNetB0 architecture for the automatic classification of heart diseases from echocardiogram data. EfficientNetB0, known for its compound scaling method that balances network depth, width, and resolution, provides a lightweight yet powerful solution for medical image analysis. The model is trained to automatically extract complex and discriminative features from echocardiographic images, reducing reliance on manual interpretation. By leveraging its efficiency and strong generalization capability, EfficientNetB0 ensures high accuracy while maintaining low computational cost, making it particularly suitable for real-time clinical use. This approach aims to support medical professionals in improving diagnostic speed, consistency, and accessibility. The proposed system holds promise for enhancing early detection and prognosis of cardiovascular diseases, ultimately making advanced diagnostic capabilities more scalable across diverse healthcare settings.
This study introduces a deep learning model for remote sensing lithology classification that leverages the efficiency of EfficientNetB0, a lightweight yet powerful convolutional neural network architecture. The model automates the identification and classification of various rock types in remote sensing images, effectively addressing the challenges of multi-class classification. EfficientNetB0 is employed as the primary feature extractor due to its compound scaling strategy, which uniformly balances network depth, width, and resolution. This design enables accurate recognition of complex geological patterns while maintaining low computational cost and memory usage. The proposed approach preprocesses the input images, extracts hierarchical spatial features using EfficientNetB0, and employs a classification head to predict the corresponding lithology classes. By integrating EfficientNetB0, the framework combines the advantages of automated feature learning with high scalability and practical applicability for large-scale geospatial datasets. Optimization techniques such as backpropagation, dropout, and regularization are applied to improve generalization and minimize overfitting. This enhances the robustness of the model in handling diverse lithological features across varying terrain conditions. The proposed system offers a reliable and efficient solution for automatic rock type classification in remote sensing, contributing to the advancement of geoscientific research and supporting real-time applications in mineral exploration, environmental monitoring, and geological mapping.
We research how deep learning convolutional neural networks (CNN) can be used to automatically classify the unique data of naval ships images from the dataset collection. We investigate the impact of data preprocessing and externally obtained images on model performance and propose the Xception algorithm as an enhancement to our existing CNN approach. Additionally, we explore how the models can be made transparent using visually appealing interpretability techniques. Our findings demonstrate that the Xception algorithm significantly improves classification performance compared to the traditional CNN approach. The results highlight the importance of appropriate image preprocessing, with image combined with soft augmentation contributing notably to model performance. This research is original in several aspects, notably the uniqueness of the acquired dataset and the analytical modeling pipeline, which includes comprehensive data preprocessing steps and the use of deep learning techniques. Furthermore, the research employs explanatory tools like Xception to enhance model interpretability and usability. We believe the proposed methodology offers significant potential for documenting historic image collections.
After the coronavirus disease 2019 (COVID-19) outbreak, the viral infection known as monkeypox gained significant attention, and the World Health Organization (WHO) classified it as a global public health emergency. Due to the similarities between monkeypox and other pox viruses, traditional classification methods face challenges in accurately identifying the disease. Moreover, the sharing of sensitive medical data raises concerns about privacy and security. Integrating deep neural networks with federated learning (FL) offers a promising approach to overcome these challenges in medical data categorization. In this context, we propose an FL-based framework leveraging the Xception deep learning model to securely classify monkeypox and other pox viruses. The proposed framework utilizes the Xception model for classification and a federated learning environment to ensure data security. This approach allows the model to be trained on distributed data sources without transferring sensitive data, thus enhancing privacy protection. The federated learning environment also enables collaboration across institutions while maintaining the confidentiality of patient data. The experiments are conducted using publicly available datasets, demonstrating the effectiveness of the proposed framework in providing secure and accurate classification of monkeypox disease. Additionally, the framework shows promise in other medical classification tasks, highlighting its potential for widespread application in the healthcare sector.
Mango leaf diseases pose a significant threat to crop yield and quality, directly impacting agricultural productivity and farmer livelihoods. Accurate and timely detection of these diseases is essential for effective disease management and sustainable farming practices. This project proposes a deep learning-based approach for automated mango leaf disease classification using the Xception (Extreme Inception) model, a state-of-the-art convolutional neural network (CNN) architecture. Xception leverages depthwise separable convolutions to efficiently capture fine-grained patterns such as texture, color variations, and lesion shapes in leaf images, enabling high accuracy in disease classification. To ensure data privacy and scalability, the system incorporates federated learning, allowing multiple agricultural institutions or farmer cooperatives to collaboratively train the model without sharing raw data. This decentralized approach prevents unauthorized access to sensitive datasets while continuously improving model performance across diverse environments. The proposed system thus combines advanced deep learning techniques with privacy-preserving collaborative learning, offering a robust, secure, and scalable solution for early detection and classification of mango leaf diseases. Its implementation can contribute significantly to precision agriculture, reducing economic losses and supporting sustainable crop management.
Osteoporosis is a skeletal disease that is difficult to identify in advance of symptoms. Existing skeletal disease screening methods, such as dual-energy X-ray absorptiometry, are only used for specific purpose due to cost and safety reasons once symptoms develop. Early detection of osteopenia and osteoporosis using other modalities for relatively frequent examinations is helpful in terms of early treatment and cost. Recently, many studies have proposed deep learning-based osteoporosis diagnosis methods for various modalities and achieved outstanding results. However, these studies have limitations in clinical use because they require tedious processes, such as manually cropping a region of interest or diagnosing osteoporosis rather than osteopenia. In this study, we present a classification task for diagnosing osteopenia and osteoporosis using computed tomography (CT). Additionally, we propose a multi-view CT network (MVCTNet) that automatically classifies osteopenia and osteoporosis using two images from the original CT image. Unlike previous methods that use a single CT image as input, the MVCTNet captures various features from the images generated by our multi-view settings. The MVCTNet comprises two feature extractors and three task layers. Two feature extractors use the images as separate inputs and learn different features through dissimilarity loss. The target layers learn the target task through the features of the two feature extractors and then aggregate them. For the experiments, we use a dataset containing 2,883 patients’ CT images labeled as normal, osteopenia, and osteoporosis. Additionally, we observe that the proposed method improves the performance of all experiments based on the quantitative and qualitative evaluations
The automated classification and quality grading of fruits are critical components in advancing agricultural efficiency, yet remain underutilized in current computer vision applications. This study presents a dual-stage deep learning approach leveraging same-domain transfer learning with the NASNetMobile architecture for simultaneous fruit type recognition and quality assessment. Initially, the model is trained to classify six distinct fruits—banana, apple, orange, pomegranate, lime, and guava—using the FruitNet dataset. The learned parameters from this classification task are then transferred to a secondary grading model to evaluate the quality of the identified fruits. To overcome dataset imbalance and enhance generalization, a fusion of data augmentation strategies including AugMix, CutMix, and MixUp is employed. Experimental results confirm that this methodology improves both classification and grading performance, highlighting the effectiveness of intra-domain transfer learning. The proposed framework offers a scalable and efficient solution for real-time fruit inspection systems, contributing significantly to the development of intelligent agricultural automation technologies.
In this work, we extend the recently proposed Quantum Vision (QV) theory in deep learning for object recognition by integrating it with the Xception architecture, forming a novel Heavy QV-Xception model. The QV theory, inspired by the particle-wave duality in quantum physics, treats objects as information waves rather than static images, enabling deep neural networks to capture richer representations. Building on this concept, our Heavy QV-Xception model leverages a robust QV block to transform conventional images into wave-function representations and processes them through the depthwise separable convolutional layers of Xception for enhanced feature extraction. This hybrid approach benefits from both the quantum-inspired information representation and the efficient, high-performance architecture of Xception. Extensive experiments on multiple benchmark datasets demonstrate that the Heavy QV-Xception model consistently outperforms standard Xception and other conventional CNNs, highlighting the effectiveness of combining QV theory with advanced deep learning architectures for improved object recognition accuracy.
Early detection of ovarian cancer remains a major challenge due to subtle symptoms and poor survival rates. This study presents a comparative analysis of machine learning (ML) and deep learning (DL) models for predicting ovarian cancer using clinical and biomarker data. The dataset undergoes comprehensive preprocessing, including handling missing values, outlier removal, normalization, and dimensionality reduction via PCA. Feature selection techniques such as Feature Importance, Recursive Feature Elimination (RFE), and autoencoder-based methods are employed to enhance model performance. Various classifiers—including KNN, SVM, Logistic Regression, Random Forest, and deep networks like ANN, FNN, CNN, RNN, and Xception—are evaluated. Our results indicate that the Xception model, combined with autoencoder-based feature selection, achieved the highest accuracy, demonstrating its capability to capture complex feature interactions. This study highlights the significance of integrating optimized preprocessing, feature engineering, and deep learning for effective early diagnosis of ovarian cancer.
Early and accurate diagnosis of Alzheimer’s disease (AD), particularly the transition from Cognitively Normal (CN) to Mild Cognitive Impairment (MCI), is crucial for timely intervention and improved patient outcomes. Building upon existing deep learning approaches that utilize convolutional neural networks (CNNs) with channel attention mechanisms, this research develops an enhanced model based on the Xception architecture. The proposed system employs advanced feature extraction techniques to capture both local and global neuroimaging features, integrated through a learned fusion mechanism. Our model significantly outperforms previous methods, achieving an accuracy of 99% in binary classification of CN versus MCI subjects. This improvement underscores the potential of the Xception network in capturing subtle imaging biomarkers associated with early cognitive decline. The system aims to provide a reliable, accessible diagnostic tool to support clinicians in early Alzheimer’s detection, facilitating timely and targeted therapeutic strategies.
The increasing prevalence of thyroid cancer underscores the critical need for efficient classification and early detection of thyroid nodules. Automated systems can significantly aid physicians by expediting diagnostic processes. However, achieving this goal remains challenging due to limited medical image datasets and the complexity of feature extraction. This study addresses these challenges by emphasizing the extraction of meaningful features essential for tumor detection. The proposed approach integrates advanced techniques for feature extraction, enhancing the capability to classify thyroid nodules in ultrasound images. The classification framework includes distinguishing between benign and malignant nodules, as well as identifying specific suspicious classifications. The combined classifiers provide a comprehensive characterization of thyroid nodules, demonstrating promising accuracy in preliminary evaluations. These results mark a significant advancement in thyroid nodule classification methodologies. This research represents an innovative approach that could potentially offer valuable support in clinical settings, facilitating more rapid and accurate diagnosis of thyroid cancer.
Fingerprint Liveness Detection (FLD) is a critical component of biometric authentication systems, protecting them from presentation attacks using artificial fingerprints fabricated from materials such as silicone, gelatine, and latex. While existing methods based on Convolutional Neural Networks (CNNs) or multimodal biometric traits provide promising results, they often increase system complexity, computational cost, or hardware requirements. To overcome these limitations, this paper introduces a lightweight deep learning framework for robust fingerprint liveness detection. The proposed system employs an efficient object detection model with an enhanced backbone and decoupled detection head, enabling the extraction of fine ridge-level features such as pore distribution and distortions, as well as global liveness cues like perspiration dynamics and texture irregularities. Unlike multimodal approaches that require auxiliary biometric data, the framework operates solely on fingerprint images, ensuring hardware simplicity while retaining high discriminative power. The model is trained end-to-end on benchmark datasets, incorporating advanced regularization and a cosine-annealed Adam optimizer to improve generalization and reduce overfitting. Experimental evaluations confirm that the proposed framework achieves superior spoof detection accuracy, strong resistance to novel attack materials, and fast inference speed compared to state-of-the-art approaches. With its lightweight design and adaptability, the system offers a practical and scalable solution for enhancing the reliability of biometric authentication in real-world scenarios.
Predicting adolescent concern over unhealthy food advertisements is critical for promoting health awareness and guiding public policy. This study utilizes XGBoost, a gradient boosting machine learning model, to predict concern levels among adolescents based on demographic and behavioral features. Survey data from 1030 adolescents were collected, including age, parental education, and advertisement exposure types, such as celebrity endorsements and free toys. The model is trained with hyperparameter tuning and synthetic oversampling to handle imbalanced classes. Explainable AI techniques (LIME and SHAP) are applied to interpret feature importance, providing insights into which factors most influence adolescent concern. Results demonstrate that XGBoost achieves high predictive accuracy, offering an effective and interpretable solution for understanding and mitigating the impact of unhealthy food advertisements.
Obesity continues to pose a major global health concern, highlighting the urgent need for effective early risk assessment strategies. This study presents a comprehensive machine learning framework aimed at classifying obesity risk while ensuring transparency and interpretability in decision-making. By utilizing detailed data on individuals' physical attributes and lifestyle behaviors, the proposed system identifies patterns associated with varying obesity levels. A key focus of the project is to enhance the interpretability of predictions, allowing users and healthcare professionals to understand the reasoning behind the classification outcomes. This contributes not only to improved trust in the model's results but also supports the development of targeted and personalized preventive measures. Ultimately, the approach bridges the gap between advanced predictive capabilities and practical, human-understandable insights for better health management.
Heart failure remains a leading cause of mortality worldwide, necessitating robust predictive models to facilitate timely medical interventions. This study presents a machine learning framework for heart failure survival prediction, leveraging an optimized XGBoost model integrated with the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance. Utilizing a dataset of 5000 clinical records, including features such as age, ejection fraction, and serum creatinine, we applied SelectKBest with Chi-square for feature selection to identify the most impactful predictors. The XGBoost model, selected after evaluating multiple algorithms (including Logistic Regression, Decision Tree, KNN, SVM, and Random Forest), was fine-tuned to optimize hyperparameters, achieving a test accuracy of 99.70%, with precision, recall, and F1-scores near 1.00, and an AUC-ROC of 0.9998. SMOTE effectively balanced the dataset, enhancing the model’s ability to predict minority class outcomes. Model performance was rigorously assessed using metrics like accuracy, confusion matrices, and learning curves to detect overfitting, with results indicating minimal overfitting in XGBoost. Compared to the baseline Gradient Boosting Machine (GBM) with Adaptive Inertia Weight Particle Swarm Optimization (AIW-PSO) from prior work, which achieved 94% accuracy on a smaller dataset (299 patients), our approach demonstrates superior performance, likely due to the larger dataset and advanced preprocessing. This study highlights the efficacy of XGBoost combined with SMOTE for clinical predictive tasks and offers a scalable, high-accuracy tool for heart failure prognosis, with potential to improve patient outcomes through precise and timely clinical decision-making.
Artificial intelligence (AI) continues to drive transformative changes across various domains, with medical science emerging as one of its most impactful beneficiaries. In particular, lung cancer—being one of the most fatal forms of cancer—demands advanced tools for early and reliable detection. This study introduces a comprehensive approach to lung cancer classification by incorporating synthetic data generation to address class imbalances and enhance model performance. The methodology involves augmenting medical datasets to improve data diversity and applying advanced machine learning techniques for predictive analysis. Extensive evaluations were carried out using multiple data preprocessing strategies and comparative models to ensure robustness and reliability. This framework demonstrates the potential of combining synthetic data augmentation with predictive modeling in aiding early diagnosis, supporting clinicians in making informed decisions, and ultimately contributing to more effective and personalized treatment strategies.
Crude oil is a globally significant energy resource whose price fluctuations have far-reaching economic and industrial impacts. Accurate forecasting of crude oil prices is crucial for strategic decision-making in sectors such as finance, energy, and transportation. This project presents a machine learning-based approach to predict monthly crude oil prices using historical market data and engineered time-series features. The model is developed using the CatBoost Regressor, a high-performance gradient boosting algorithm known for its efficiency, accuracy, and ability to handle complex non-linear data. The predictive features include lagged prices from previous months, rolling statistical indicators (mean and standard deviation), temporal features such as month and year (encoded using sine and cosine transformations to preserve seasonality), and both percentage and absolute monthly price changes. The dataset spans over four decades (1983–2025), ensuring that the model captures long-term patterns and short-term fluctuations. The performance of the model is evaluated using standard regression metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), demonstrating strong predictive accuracy and generalization capability. This project showcases the effectiveness of feature engineering combined with gradient boosting techniques for time-series forecasting and provides a reliable, interpretable, and scalable solution for crude oil price prediction.
Sports like basketball and baseball have seen significant advancements through the effective use of sports analytics. In contrast, machine learning applications in football have largely concentrated on outcome prediction rather than player evaluation. This study aims to bridge that gap by presenting a descriptive analysis of football player performance using a football-specific dataset. Traditionally, player performance assessments rely on expert panels, though the criteria they use remain undisclosed. In this research, the Support Vector Classifier (SVC) algorithm is employed to analyze and classify player performance data, identifying key functional attributes relevant to different playing positions. By tuning kernel functions and hyperparameters, the model effectively highlights the most impactful performance metrics, offering objective insights that align with expert evaluations. The dataset used comprises detailed performance data from football matches, making the analysis specific and relevant to the sport. The application of SVC allowed the development of highly accurate classifications with minimal error, thus validating the algorithm’s effectiveness in rating prediction tasks. The results indicate that SVC can serve as a powerful tool in football analytics, enabling data-driven decision-making for coaches, analysts, and scouts. This approach not only enhances transparency in player assessment but also supports more strategic planning based on performance-driven evidence.
Ransomware attacks are becoming increasingly frequent and sophisticated, posing serious challenges to cybersecurity defenses worldwide. Traditional detection methods often fall short when faced with the evolving nature of ransomware, which frequently employs obfuscation and evasion techniques to bypass static signature-based systems. In this study, a machine learning-based approach is proposed to detect ransomware through the analysis of API call data, which captures the dynamic behavior of programs during execution. The temporal dynamics, intricate sequential patterns, and high dimensionality inherent in ransomware behavior are key factors considered in this research. A Random Forest classifier is employed due to its robustness and ability to handle complex datasets, delivering high predictive accuracy.The model is trained using features derived from temporal intervals, API call frequencies, and sequential patterns, allowing it to effectively distinguish between ransomware and benign software. With an accuracy exceeding 95%, the system demonstrates strong predictive performance and practical applicability. This approach is further integrated into a Flask-based web application, enabling real-time detection in an interactive and user-friendly environment. The proposed method provides a scalable and efficient architecture for ransomware classification, offering security professionals a powerful tool for early threat identification and mitigation. By leveraging ensemble learning techniques, the system exemplifies the potential of behavioral analysis in advancing automated malware detection and strengthening overall cybersecurity resilience.
Rolling bearing faults frequently cause rotating equipment failure, leading to costly downtime and maintenance expenses. As a result, researchers have focused on developing effective methods for diagnosing these faults. In this paper, we explore the potential of Machine Learning (ML) techniques for classifying the health status of bearings. Our approach involves decomposing the signal, extracting statistical features, and using feature selection employing Binary Grey Wolf Optimization. We propose an ensemble method using voting classifiers to diagnose faults based on the reduced set of features. To evaluate the performance of our methods, we utilize several performance indicators. Our results demonstrate that the proposed voting classifiers method achieves superior fault classification, highlighting its potential for use in predictive maintenance applications.
Efficient agricultural planning is essential for ensuring global food security amid increasing population pressures and limited cultivable land. This study introduces a crop recommendation system that leverages machine learning techniques to analyze diverse parameters, including physical and chemical soil properties and environmental factors. The model is trained using the Random Forest Classifier, which achieved an accuracy of 99% on the test dataset. The system supports the recommendation of 22 crops, ranging from cereals and legumes to fruits and commercial crops. By employing a data-driven approach, the model assists farmers in selecting the most suitable crop based on current conditions, thereby promoting sustainable agricultural practices and maximizing yield. This work contributes to the advancement of intelligent agriculture and sets a foundation for future expansions such as seasonal crop rotation and precision farming solutions.
With the rapid expansion of the internet and digital services, ensuring network security has become a critical challenge. Traditional rule-based intrusion detection systems often fail to adapt to evolving cyber-attacks, making machine learning (ML) a promising alternative for anomaly detection in network traffic. This project focuses on developing an intelligent anomaly detection framework using advanced ML techniques. The KDD Cup 1999 dataset is used as the benchmark dataset, consisting of both normal and malicious network connections. Data pre-processing, feature selection, and model training are performed to enhance detection accuracy and efficiency. Several algorithms are evaluated, including Decision Trees, Random Forest, and Gradient Boosting methods. Among these, CatBoost, an advanced gradient boosting algorithm, demonstrates superior performance with an accuracy of over 99% on test data, outperforming the approaches mentioned in the base paper. The system effectively classifies traffic into normal and attack categories, offering a scalable and accurate solution for intrusion detection. Additionally, a Flask-based web application is developed with modules for user authentication, anomaly prediction, and visualization of model performance, making the solution practical for real-world deployment.
This study addresses the growing concern of mental health challenges faced by college students engaged in entrepreneurship. It proposes a novel evaluation tool designed to assess psychological well-being by capturing complex relationships within mental health data. The approach integrates advanced feature processing and a dual-structure network to enhance the recognition and evaluation of psychological characteristics. The model’s effectiveness is demonstrated through comprehensive testing on multiple public datasets from different countries, highlighting its strong predictive capabilities. The findings offer valuable insights for developing targeted mental health interventions, supporting entrepreneurship education, and providing practical guidance to improve the success rates of student entrepreneurs. This work holds significant implications for educational institutions and policy makers aiming to foster healthier and more supportive environments for young entrepreneurs.
Life Expectancy prediction models play a critical role in shaping the social, economic, and healthcare structures of countries worldwide. Accurate forecasting of life expectancy has profound implications for public health planning, resource allocation, insurance modeling, and the development of long-term care strategies. While traditional models have primarily relied on mortality rates and demographic statistics of target populations, these approaches often lack the complexity needed to capture the multifaceted nature of human longevity. Recent research emphasizes the integration of broader determinants such as education levels, healthcare access, economic indicators, and social welfare metrics to enhance prediction accuracy. In response to this growing need for more robust forecasting methods, this study explores the application of advanced machine learning algorithms to predict life expectancy across both developed and developing regions. Grid Search Cross-Validation (Grid Search CV) was utilized to fine-tune hyperparameters and prevent overfitting, thereby enhancing the generalization capability of the models. Among the ensemble methods, Random Forest and XGBoost emerged as top performers due to their robustness in handling complex, nonlinear relationships and high-dimensional data. Additionally, AdaBoost contributed significantly by focusing on correcting errors made by weaker models, leading to better convergence and stability in prediction. The use of Grid Search CV ensured that the optimal configuration of each algorithm was selected based on cross-validated performance metrics such as Mean Absolute Error (MAE) and R² score. This data-driven approach demonstrated substantial improvements over conventional statistical methods, highlighting the potential of machine learning in building dynamic and highly accurate life expectancy prediction systems.
Parkinson’s Disease (PD) is a chronic and progressive neurological disorder that affects the body's nervous system, leading to difficulties in movement, coordination, and speech. One of the early symptoms of PD is dysphonia—a voice disorder characterized by altered vocal quality, pitch, and loudness. Since seventy to ninety percent of individuals with Parkinson’s exhibit speech impairments, analyzing vocal features becomes a valuable approach for early diagnosis. Existing studies have applied various machine learning models to detect PD using speech signals; however, challenges such as class imbalance, optimal feature selection, and limited interpretability remain prevalent. To overcome these limitations, this study introduces a predictive framework utilizing the K-Nearest Neighbors (KNN) algorithm for the detection of Parkinson’s disease based on speech data. KNN, a non-parametric and instance-based learning algorithm, classifies patients by comparing their vocal feature patterns with those of the nearest neighbors in the dataset. Its simplicity, flexibility in handling non-linear data distributions, and effectiveness with small to medium-sized datasets make it suitable for medical diagnostic tasks. By optimizing the value of k and using distance metrics such as Euclidean or Manhattan distance, the model can achieve high accuracy in distinguishing PD patients from healthy individuals. Moreover, feature normalization and dimensionality reduction techniques are applied to improve the performance and reliability of KNN. This approach aims to enhance the precision of early Parkinson’s detection while maintaining interpretability, offering a clinically relevant and data-driven solution compared to traditional diagnostic methods.
The phenomenon of cancellations in hotel bookings is a significant challenge in the hospitality sector as it distorts demand forecasting and can lead to substantial revenue losses. Forecasting booking cancellations remains relatively underexplored, particularly in understanding the behavioral factors driving cancellations. This paper proposes a novel approach to predicting hotel booking cancellations using a Multi-Layer Perceptron (MLP) Classifier, a type of deep learning model capable of capturing non-linear relationships in structured datasets. The MLP Classifier strengthens the prediction process by analyzing various customer and booking-related attributes such as lead time, room type, location, and customer segment. By adjusting hyperparameters including the number of hidden layers, neurons, activation functions, and learning rate, the model effectively learns patterns associated with cancellations, leading to highly accurate predictions. This approach provides hotel managers with a reliable forecasting tool to anticipate cancellation risks, enabling more effective demand planning, resource allocation, and revenue management strategies.
Port congestion and prolonged vessel waiting times present significant obstacles to the efficiency of global maritime logistics, resulting in elevated operational costs and logistical inefficiencies. In response to these challenges, this study introduces a classification-based predictive framework leveraging the Random Forest Classifier to categorize vessel arrivals into different delay risk levels. By classifying expected waiting times into discrete categories, this approach enables port authorities to make informed, real-time decisions regarding resource allocation and scheduling priorities. Unlike traditional regression models that predict continuous delay durations, the proposed classification model focuses on identifying critical thresholds that signify congestion risk. This shift allows for more actionable insights, particularly in time-sensitive port operations. The use of the Random Forest Classifier as the core predictive model enhances accuracy and robustness by combining multiple decision trees into an ensemble, reducing overfitting and improving generalization. Its ability to handle heterogeneous maritime data ensures reliable performance across diverse operational scenarios. Furthermore, Random Forest provides measures of feature importance, enabling the identification of key contributing factors—such as voyage characteristics, berth availability, and historical delay patterns—that influence vessel waiting times. This framework offers a scalable and practical solution for intelligent transportation systems, aligning with the growing need for smart port management and optimized logistics planning.
Efficient and reliable operation of photovoltaic (PV) systems is crucial for sustainable energy generation. However, faults such as partial shading and dirt accumulation significantly reduce the power output of PV modules. To address this issue, this project presents a machine learning-based fault detection framework for classifying and identifying faults in PV systems using only electrical and environmental data. The proposed system utilizes supervised machine learning algorithms including Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN). The dataset comprises key features such as voltage, current, ambient temperature, and irradiance, collected under three operating conditions: normal operation, partial shading, and dirt accumulation. Data preprocessing techniques such as normalization, label encoding, and data splitting were applied to prepare the dataset for training and testing. The performance of each model was evaluated using standard classification metrics: accuracy, precision, recall, F1-score, and confusion matrix. Experimental results demonstrate that all models performed well in detecting faults within the same system used for training, with the Artificial Neural Network achieving the highest accuracy, exceeding 98% precision. However, when models were tested on data from a different PV system with varying characteristics, performance degraded, highlighting the need for system-specific model training. This study concludes that machine learning models, particularly ANN, are effective for fault detection in PV systems when trained on relevant system data. The approach offers a cost-effective and automated solution for enhancing the reliability and performance of solar energy systems.
Electrodermal activity (EDA) has emerged as a valuable physiological indicator for assessing pain levels in individuals. This study focuses on identifying key features within EDA signals that are influential in classifying pain responses. The methodology involves decomposing EDA signals, extracting meaningful features based on signal characteristics such as domain amplitude and frequency, and applying classification techniques to distinguish between painful and non-painful stimuli. Feature selection methods are also employed to enhance model performance by identifying the most relevant attributes. The results confirm that EDA signals provide reliable insights for pain level classification, highlighting their potential application in healthcare for more accurate and automated pain assessment.
People can use credit cards for online transactions as it provides an efficient and easy-to-use facility. With the increase in usage of credit cards, the capacity of credit card misuse has also enhanced. Credit card frauds cause significant financial losses for both credit card holders and financial companies. In this research study, the main aim is to detect such frauds, including the accessibility of public data, high-class imbalance data, the changes in fraud nature, and high rates of false alarm. The relevant literature presents many machine learning based approaches for credit card detection, such as Extreme Learning Method, Decision Tree, Random Forest, Support Vector Machine, Logistic Regression and XG Boost. However, due to low accuracy, there is still a need to apply state of the art deep learning algorithms to reduce fraud losses. The main focus has been to apply the recent development of deep learning algorithms for this purpose. Comparative analysis of both machine learning and deep learning algorithms was performed to find efficient outcomes. The detailed empirical analysis is carried out using the European card benchmark dataset for fraud detection. A machine learning algorithm was first applied to the dataset, which improved the accuracy of detection of the frauds to some extent. Later, three architectures based on a convolutional neural network are applied to improve fraud detection performance. Further addition of layers further increased the accuracy of detection. A comprehensive empirical analysis has been carried out by applying variations in the number of hidden layers, epochs and applying the latest models. The evaluation of research work shows the improved results achieved, such as accuracy, f1-score, precision and AUC Curves having optimized values of 99.9%,85.71%,93%, and 98%, respectively. The proposed model outperforms the state-of-the-art machine learning and deep learning algorithms for credit card detection problems. In addition, we have performed experiments by balancing the data and applying deep learning algorithms to minimize the false negative rate. The proposed approaches can be implemented effectively for the real-world detection of credit card fraud.
Insurance fraud, particularly within the automobile insurance sector, is a significant challenge faced by insurers, leading to financial losses and influencing pricing strategies. Fraud detection models are often impacted by class imbalance, where fraudulent claims are much rarer than legitimate claims, and missing data further complicates the process. This research tackles these issues by utilizing two car insurance datasets—an Egyptian real-life dataset and a standard dataset. The proposed methodology includes addressing missing data and class imbalance, and it incorporates the AdaBoost Classifier to enhance the model’s accuracy and predictive power. The results demonstrate that addressing class imbalance plays a crucial role in improving model performance, while handling missing data also contributes to more reliable predictions. The AdaBoost Classifier significantly outperforms existing techniques, improving prediction accuracy and reducing overfitting, which is often a challenge in fraud detection models. This study presents valuable insights into how improving data quality and using advanced algorithms like AdaBoost can enhance fraud detection systems, ultimately leading to more effective identification of fraudulent claims. These enhancements can significantly aid insurance companies in reducing financial losses, improving decision-making, and refining pricing models.
Breast cancer continues to be one of the leading causes of cancer-related deaths among women worldwide. Early and accurate diagnosis is crucial to improving treatment outcomes and survival rates. This study develops a robust machine learning-based breast cancer classification system utilizing the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Emphasizing the importance of feature selection, the study identifies the most influential tumor characteristics that significantly contribute to malignancy prediction. Multiple classification algorithms—including Random Forest, Support Vector Machine (SVM), and k-Nearest Neighbors (k-NN)—were implemented and rigorously evaluated to determine their predictive performance. After applying feature scaling and selecting optimal features, the SVM model achieved the highest classification accuracy of 96.49%, demonstrating its effectiveness in distinguishing between benign and malignant tumors. To facilitate practical clinical application, the model was deployed via a web-based interface built using Flask, allowing healthcare professionals to input tumor measurements and receive immediate diagnostic predictions. This deployment bridges the gap between complex machine learning models and real-world usability, supporting early detection efforts in clinical settings. The project underscores the potential of combining advanced computational techniques with intuitive interfaces to improve diagnostic workflows. Future work aims to integrate explainable artificial intelligence (XAI) methods to enhance transparency in prediction outcomes and to extend the model’s applicability by incorporating diverse and larger clinical datasets, thereby increasing its robustness and generalizability.
Electric Vehicle (EV) adoption is rapidly increasing, necessitating efficient and reliable battery management systems, especially in battery swapping infrastructures. This project presents a Machine Learning-driven web application for real-time battery health estimation, aimed at enhancing the efficiency of EV battery swapping systems. Built using Flask as the backend web framework and Python for data processing and machine learning, the system predicts two critical parameters of battery condition: State of Health (SoH) and remaining charge cycles.To achieve accurate predictions, the application leverages Random Forest Regression and XGBoost, two powerful ensemble learning algorithms, trained on historical battery usage data including charge/discharge current, voltage, temperature, and cycle counts. The system processes user input in real time and displays the battery’s health status via a user-friendly interface, enabling swift decision-making at battery swapping stations. This solution not only promotes proactive maintenance and optimal utilization of EV batteries but also supports sustainable energy practices by reducing the chances of premature battery disposal. The combination of ML with a lightweight Flask-based deployment makes the application scalable, efficient, and suitable for integration into real-world EV infrastructure.
Pregnancy complications remain a major concern in maternal and fetal healthcare, where early and accurate detection is essential for effective intervention. Conventional manual interpretation of Cardiotocography (CTG) data is time-consuming and often subjective, leading to inconsistencies in fetal health assessment. To address this, the present study proposes an efficient machine learning-based system utilizing the XGBoost algorithm, known for its superior performance in handling structured data. Leveraging a publicly available CTG dataset, the proposed model achieves a notable accuracy of 96%, significantly outperforming earlier approaches. This demonstrates XGBoost's capability to model complex patterns in fetal monitoring data with high reliability. The system enhances diagnostic precision, reduces clinician workload, and supports timely clinical decision-making. This work highlights the potential of integrating robust ML models into prenatal care workflows, thereby advancing automated fetal health evaluation and improving maternal and neonatal outcomes.
Electricity demand forecasting plays a pivotal role in power grid management, enabling optimal resource allocation, load balancing, and energy trading decisions. Traditional statistical models like ARIMA or linear regression struggle to handle high-dimensional, non-linear relationships commonly found in modern datasets that include weather variations, seasonal shifts, and holiday influences. To address these limitations, this study proposes a deep learning-based framework using a Multi-Layer Perceptron (MLP) architecture tailored for short-term national power load prediction. The model is trained on a real-world dataset comprising 15 continuous and categorical features, such as surface temperature (T2M), humidity (QV2M), wind speed (W2M), and atmospheric liquid water content (TQL), collected from three different geographical zones — TOC, SAN, and DAV — along with holiday and school calendar data. Each input is normalized using Min-Max Scaling to stabilize the learning process and accelerate convergence. The model architecture features multiple fully connected layers with ReLU activation functions to effectively model non-linear dependencies. Through extensive experimentation and hyperparameter tuning, the proposed MLP model achieved a Mean Absolute Percentage Error (MAPE) of 4.81%, showcasing significant accuracy and reliability in predicting power demand. Compared to traditional approaches, this deep learning method offers better generalization, is less prone to feature correlation pitfalls, and adapts well to changing patterns in weather and user behavior. Such a solution is vital for energy utilities seeking a scalable, data-driven strategy to anticipate load fluctuations and prevent outages in real time.
Healthcare fraud detection is a critical task that faces significant challenges due to imbalanced datasets, which often result in suboptimal model performance. Previous studies have primarily relied on traditional machine learning (ML) techniques, which struggle with issues like overfitting caused by Random Oversampling (ROS), noise introduced by the Synthetic Minority Oversampling Technique (SMOTE), and crucial information loss due to Random Undersampling (RUS). In this study, we propose a novel approach to address the imbalanced data problem in healthcare fraud detection, with a focus on the Medicare Part B dataset. Our approach begins with the careful extraction of the categorical feature "Provider Type," which allows for the generation of new, synthetic instances by replicating existing types to enhance diversity within the minority class. To further balance the dataset, we employ a hybrid resampling technique, SMOTE-ENN, which integrates the Synthetic Minority Oversampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) to generate synthetic data points while removing noisy, irrelevant instances. This combined technique not only balances the dataset but also helps in mitigating the potential adverse effects of imbalanced data. We evaluate the performance of the logistic regression model on the resampled dataset using common evaluation metrics such as accuracy, F1 score, recall, precision, and the AUC-ROC curve. Additionally, we emphasize the importance of the Area Under the Precision-Recall Curve (AUPRC) as a critical metric for evaluating model performance in imbalanced scenarios. The experimental results demonstrate that logistic regression achieves an impressive 98% accuracy, outperforming other methods and validating the efficacy of our proposed approach for detecting healthcare fraud in imbalanced datasets.
Sleep apnea is a prevalent and potentially life-threatening sleep disorder characterized by frequent interruptions in breathing during sleep. These episodes can lead to serious health complications including cardiovascular diseases, fatigue, and impaired cognitive function. While recent deep learning models have shown strong performance in apnea detection, they often require high computational resources and lack transparency, making them less suitable for real-time or edge-based healthcare applications. In this study, we propose a machine learning-based approach using Extreme Gradient Boosting (XGBoost) to detect sleep apnea events efficiently and accurately from physiological signals.The proposed system involves a robust feature engineering pipeline that extracts statistical, temporal, and frequency-domain features from biosignals such as ECG, airflow, and oxygen saturation (SpO₂) data. These features are selected and refined using correlation analysis and feature importance techniques to eliminate redundancy and enhance classification performance. The extracted features are then input to an XGBoost classifier, which is optimized using cross-validation and hyperparameter tuning to address class imbalance and improve generalization. Our method is evaluated on the Sleep Heart Health Study (SHHS) dataset and demonstrates competitive accuracy while significantly reducing computational overhead compared to deep learning models. Moreover, the model provides interpretable outputs through feature importance analysis, allowing clinical professionals to better understand decision factors. This makes the approach highly suitable for real-time monitoring, portable device deployment, and explainable AI-driven diagnostics in the context of sleep apnea and related disorders.