p-ISSN
2722-7782 | e-ISSN 2722-5356
DOI: https://doi.org/ 10.46799/jsa.v4i7.654
CLASSIFICATION
OF CONGESTION IN JAKARTA USING KNN, NA�VE BAYES AND DECISION TREE METHODS
Sri Rahayu, Bayu Rimbi Asmoro, Ery
Rinaldi
Faculty of Information Technology, Budi Luhur University
[email protected], [email protected],
[email protected]
Abstract
Congestion has now become a problem that
occurs in almost all big cities in Indonesia. The problem of traffic jams
generally occurs in areas with high intensity of activity and land use. Given
the increasing level of congestion that is happening, the capital city of DKI
Jakarta is one of the most densely populated cities with high population
activity. Population activities are also offset by the use of transportation.
Both by public and private vehicles. Traffic jams are one of the problems that
are still unsolved. West Palmerah Street is one of
the roads with quite a lot of traffic jams. To prove it, he did some simple
research. The method used is descriptive method, where the research begins with
collecting the data needed at this time through several surveys. And the
calculation is done by looking for the degree of saturation (DS) and vehicle
speed at three checkpoints, the DS at the pre-market checkpoint is 0.89, the DS
at the market checkpoint is 1.05, and the DS at the market checkpoint is 0.89.
Then the movement speed was also obtained at the pre-market observation point
of 32.05 km/hour, at the market review point of 27.5975 km/hour, and at the
post-market observation point of 33.35 km/hour. The results prove that there is
indeed a traffic delay in front of the market. This figure is due to the large
number of angkots that stop and the narrowing of the
traffic lane in front of the market due to the presence of street vendors and
motorbikes stopping on the sidewalks with buying and selling activities on the
sidewalks. Therefore, it is necessary to apply the best operational solutions
to improve traffic flow on these roads.
Keywords :
Congestion in Jakarta, Classification, K-nearest neighbors, Na�ve Bayes,
Decision Tree.
Abstract
Congestion has now become a problem that
occurs in almost all big cities in Indonesia. The problem of traffic jams
generally occurs in areas with high intensity of activity and land use. Given
the increasing level of congestion that is happening, the capital city of DKI
Jakarta is one of the most densely populated cities with high population activity.
Population activities are also offset by the use of transportation. Both by
public and private vehicles. Traffic jams are one of the problems that are
still unsolved. West Palmerah Street is one of the
roads with quite a lot of traffic jams. To prove it, he did some simple
research. The method used is a descriptive method, where the research begins by
collecting the data needed at this time through several surveys. And the
calculation is done by looking for the degree of saturation (DS) and vehicle
speed at three checkpoints, the DS at the pre-market checkpoint is 0.89, the DS
at the market checkpoint is 1.05, and the DS at the market checkpoint is 0.89.
Then the movement speed was also obtained at the pre-market observation point
of 32.05 km/hour, at the market review point of 27.5975 km/hour, and at the
post-market observation point of 33.35 km/hour. The results prove that there is
indeed a traffic delay in front of the market. This figure is due to the large
number of angkots that stop and the narrowing of the
traffic lane in front of the market due to the presence of street vendors and
motorbikes stopping on the sidewalks with buying and selling activities on the
sidewalks. Therefore, it is necessary to apply the best operational solutions
to improve traffic flow on these roads.
Keywords: Congestion in Jakarta,
Classification, K-nearest neighbors, Na�ve Bayes, Decision Tree.
Data mining is a method of determining
certain patterns from a large amount of data. Data mining has many techniques,
one of which is a classification technique. Classification is a data learning
technique for generating value predictions from a series of attributes (Wahyuningsih & Utari, 2018) . Classification
is widely used to predict classes on certain labels by classifying data
(building models) based on training sets and values (class labels) when
classifying certain attributes. Classification is divided into five categories
based on differences in mathematical concepts, namely statistical based,
distance based, decision tree based, neural network based, and rule based.
Classification has many algorithms, but in this study using decision tree, KNN
and Na�ve Bayes algorithms (Sartika & Sensuse, 2017) . Of the
three algorithms, the decision tree is one of the most commonly used methods,
especially in data classification.
In case studies of sentiment analysis of
BPJS service users using the KNN, Na�ve Bayes and Decision Tree methods it
proves that the Decision Tree method has a high level of accuracy in data
classification (Puspita & Widodo, 2021) . In a
comparative case study of the K-Nearest Neighbor Data Mining Method with Na�ve
Bayes for the classification of Congestion in Jakarta, the KNN method is proven
to have high accuracy compared to Na�ve Bayes (Rahman et al., 2018) .
Compared to the Na�ve Bayes method, this method rarely has a high level of
accuracy, so this study will compare the three algorithms based on their level
of accuracy, which method is the best for classification.
Based on the existing problems, specifically
to compare the three decision tree methods, KNN and Na�ve Bayes, a study was
carried out with the title " Classification
of Congestion in Jakarta Using the KNN, Na�ve Bayes and Decision Tree Methods "
using the rapid method. Mining software to find the highest accuracy value of
the three methods that will be implemented in data classification is a
comparative analysis of traffic jam accuracy using KNN, naive Bayes and
decision tree classification data. The purpose of this study is to compare the
three best methods used in the classification of congestion with maximum
accuracy results.
A study that discusses the Na�ve Bayes, KNN
and Decision Tree methods for sentiment analysis of traffic jams with the
problem of traffic conditions in the city of Jakarta which are so dense and
congestion is increasing, that residents who want to work need more comfortable
transportation (Riadi & Kom , 2017) . This
research uses social media Twitter to get random data for up to 127 dates.
Using the Naive Bayes Classifier, KNN and Decision Tree methods with several
stages, namely emoticon conversion, cleaning, case stacking, tokenization and
stemming (Romadloni et al., 2019) . The
results obtained with the decision tree method have the highest accuracy
compared to KNN and Na�ve Bayes, where the decision tree has 100% accuracy,
100% accuracy, 100% sensitivity and 100% specificity. The KNN method has 80%
accuracy, 100% accuracy, 50% sensitivity, 100% specificity, and the Naive Bayes
method has 80% accuracy, 66.67% accuracy, 100% sensitivity and 66.67%
specificity.
Research on the classification of traffic
jams uses a comparison of the K-Nearest Neighbor and Na�ve Bayes data mining
methods. Monitoring and processing of the surrounding environment, including
water resources, is necessary to create traffic jams that comply with
congestion standards (Rahman et al., 2018) . The
accuracy results are 82.42% for K-Nearest Neighbor and Na�ve Bayes of 70.32%,
it can be concluded that KNearest Neighbor is the
best method for determining congestion.
In the research on sentiment analysis of
BPJS users using the KNN, Decision Tree and Na�ve Bayes methods, discussing
people who use BPJS services, which often raises pros and cons, for this
reason, data mining sentiment analysis research was carried out on BPJS.
Twitter users with 1,000 entries are filtered down to 903 due to duplicate
data. Implement the KNN, Decision Tree and Na�ve Bayes methods to compare the
level of accuracy of the three methods used (Puspita & Widodo, 2021) . This
study used rapid miner software version 9.9, where the results obtained were
that the KNN method had an accuracy rate of 95.58%, a decision tree was 96.13%
and the Naive Bayes method was 89.14%, so it can be concluded that the best
method for decision making decision tree is used.
Data Mining is the process of obtaining
information to obtain new information (Harahap, 2019) . The
research conducted this time uses data mining techniques that implement the
K-nearest Neighbors, Na�ve Bayes and Decision Tree methods to compare the
results of the maximum accuracy of the three methods used. Data mining is a
data source and use operation that is used to find relationships or patterns
from large data sets to obtain new information (Cahyanti et al., 2020) .
The K-Nearest Neighbor algorithm is a
classification method for a dataset based on previously classified training
data (Siregar et al., 2019) . The
KNN classification algorithm is a method for classifying objects based on
training data that has the shortest distance (Romadloni et al., 2019) . The
working principle of the KNN algorithm is to determine and find the shortest
distance to the nearest neighbor value in the training data with the data to be
tested. The best k value for this algorithm depends on the data value, where
usually a high k value reduces the effect of errors or noise on the
classification process, but creates suboptimal boundaries between
classifications (Sukmana et al., 2020) . This
research will carry out a computational process to obtain accurate data results
using the KNN method. The formula for finding the distance using the Euclidean
formula:
where x1 is sample data; d is distance; x2 is test data; p is the data
dimension, i is the data variable.
Naive Bayes Classifier is a data mining
method for data classification. The operation of the Naive Bayes Classifier
method uses probabilistic calculations. Naive Bayes is one of the algorithms
included in the classification technique (Zulfauzi & Alamsyah, 2020) . The
basic concept of Naive Bayes uses the Bayes theorem, which is a theorem used in
statistics that is used to calculate probabilities. The Naive Bayes Classifier
calculates the probability of one class from each group of attributes and
determines the most optimal class ( Lestari et al., 2021) . The
Naive Bayes classifier function calculates and looks for the highest
probability value to classify test data into the correct category. A simple
probability prediction technique based on the application of the Bayes theorem
or Bayes rule is a technique implemented in the Na�ve Bayes algorithm. Naive
Bayes Formula:
where X is data with unknown class; H is the hypothesis that data X is
class specific; P(H|X) is the probability of the hypothesis H under condition X ; P(H) is the probability of the hypothesis H (prior
probability); P(X|H) is the probability of X based on the conditions in
hypothesis H; P(X) is the probability of X
The data classification process can use
several methods, one of which is a decision tree. The decision tree is one of
the commonly used algorithms for decision making (Pamuji & Ramadhan, 2021) . The
decision tree is an algorithm that is good for classification or prediction (Muningsih, 2022) . The
Decision Tree Model is in the form of a tree which consists of several parts,
namely the root node, internal node, and terminal node. The root node from searching
query data and the internal node that reaches the end node is the
classification process in this decision tree method. The concept of entropy to
be used to determine which attribute in the decision tree to split, the higher
the sample entropy, the less pure the sample is. The formula for calculating
sample entropy is:
where p1, p2, p3 �.. , pn
respectively represent class 1, class 2,�.. class n proportions in the output.
METHODOLOGY
In this study several stages were used which
are presented in the form of Figure 1 Research Stages.
The first stage of this research begins with
mining data on Twitter using Orange Software and of course the Twitter website.
The second stage is the study of literature as a collection of information
relating to the preparation of the final project. Collecting information to
support this research in the form of journals, books, references and other
reliable sources. Not spared from discussions and consultations, as well as
research methods during the preparation of this diploma thesis, discussions and
consultations with supervisors and various experts in this field. The data
processing process at Rapid Miner includes several steps, starting from data
sets, pre-processing, data separation into training data and data testing,
model fitting/classification, prediction/model application, and the resulting
process. The data processing carried out will produce a result or result that
will be discussed and produce a conclusion in the research process carried out.
RESULTS AND DISCUSSION
Datasets
In this study, the overloaded csv data type
dataset was used for the classification process as well as to compare the
results of the accuracy of the three methods used, namely Naive Bayes, Decision
Tree and KNN. The results of the data obtained in Table 1.
Table 1
Traffic
jam dataset on Twitter
Pre-processing and
Labeling
The data obtained in this study need to be
processed first. Knowing the nature of the textual data previously collected,
the data labeling process was carried out. The attribute identified in this
study is pitability, an attribute that indicates
whether bottlenecks can be overcome. The labeling process can be done by
setting the color on the label to facilitate the research process. Several
pre-processing methods are used, namely data validation to obtain good data
with proper accuracy, to review the type of data obtained, and to identify data
so as to achieve a maximum level of accuracy. Make inconsistent data consistent
by replacing all missing operators. Data validation identifies and eliminates
data that is not used, as well as inconsistent data and missing data, where raw
data becomes data that is ready to be processed and can be analyzed through
data cleaning and data filtering processes. in the data validation process (Teak, 2021) . This
study uses data integration and transformation methods to increase the accuracy
of the three methods used. The Reduce Data Size and Decretize
methods are used to remove duplicate data using the delete duplicate operator.
The initial data condition of 1,000 becomes clean data through a process of
data validation, data integration and transformation, as well as data size
reduction and discretization so that the data can be analyzed to obtain new
data information.
Keyword Determination in
Orange: Jakarta Traffic jams
Process Preprocessing
Data
Word Cloud
NLTK process in Google Colabs
Data Upload Process Using
Pandas file *.csv
Stopword process
Case Folding Process
Accuracy Measurement with
Confusion Matrix
Confusion Matrix is a classification method
based on the results of the classification that has been done, where the
accuracy of the classification affects the performance of the classification.
The confusion matrix provides comparative information on the classification
results carried out by the system (model) with the actual classification
results (Fikri et al., 2020) .
The confusion matrix describes the
performance of the classification model on a set of test data whose true values
are known. Confusion Matrix is used to calculate accuracy.
Confusion Matrix
Confusion Matrix performance can be measured using the TP, FP, FN, and
TN values. True Positive is positive data that is predicted to be correct. True
Negative is negative data that is predicted to be true.
Calculating accuracy using the equation
Naive Bayes Algorithm
Accuracy Results
Confusion
Matrix Na�ve. Bayes
The accuracy result is 63.60%, with class precision for pred. zero
(pred. negative) is 64.70% and pred one ( pred.positive ) is 57.98%. Accuracy results are obtained using
equation 4, where the true positive values are 788, true negatives are 138,
false negatives are 430, and false positives are 100. Accuracy results can be
proven by:
Performance Vector itself is a form of description of the table of
analysis results obtained in the research conducted. The True Positive value is
788, which is a positive data value which means that water is safe to drink and
is predicted to have the correct value. The False Positive value is 100, where
the data is negative (water is not drinkable) but is predicted as positive
data. The False Negative value is 430, positive data but predicted as negative
data. The True Negative value is 138, which is negative data that is predicted
to be true.
Decision Tree Algorithm Accuracy
Results
Confusion
Matrix Decision Tree
The accuracy result is 80.84%, with class precision for pred. zero
(pred. negative) is 79.71% and pred one ( pred.positive ) is 83.53%. The accuracy results are obtained
using equation 4, where the true positive values are 817, true negatives are
360, false negatives are 208, and false positives are 71.
Performance Vector itself is a form of description of the table of
analysis results obtained in the research conducted. The True Positive value is
817, which is a positive data value which means that water is safe to drink and
is predicted to have the correct value. The False Positive value is 71, where
the data is negative, but it is predicted as positive data. The False Negative
value is 208, positive data but predicted as negative data. The True Negative
value is 360, which is negative data that is predicted to be true.
Accuracy results of the
K-nearest neighbors algorithm
Confusion
Matrix KNN
Accuracy results were obtained at 86.88%, where the class precision
for pred. zero (pred. negative) is 85.74% and pred one ( pred.positive ) is 89.19%.
The accuracy results are obtained using equation 4, where the true positive
values are 836, true negatives are 429, false negatives are 139, and false
positives are 52.
Performance Vector is a form of description
of the table of analysis results obtained in the research conducted. The True
Positive (TP) value has a value of 836, which is a positive data value. The
False Positive value is 52, where the data is negative (water is not drinkable)
but is predicted as positive data. The False Negative value is 139, positive
data but predicted as negative data. The True Negative value is 429, which is
negative data that is predicted to be true.
The data classification process uses several
operators to carry out classification methods, including CSV reading, data
partitioning, model application, and performance. Classification methods such
as KNN, Na�ve Bayes and Decision Tree. These operators have their respective
functions, the CSV read function is to import CSV data that has been obtained,
in CSV read mode the preprocessing method is carried out, where the
preprocessing function is to display imported data sets, whether there are
inconsistent data or missing values. The Split data operator works by taking a
set of examples as input and sending a subset of the sample sets through its
output port. To use the classification method, use the model features.
Performance is used to display the accuracy of all types of classification
methods.
Accuracy Results
Comparison
of Accuracy Results
Comparative analysis of Water Quality accuracy using data from
classification results with K-nearest neighbors, Na�ve Bayes, and Decision Tree
shows that K-nearest neighbors is the method that produces the highest level of
accuracy, namely 86.88% for the classification of quality data used in this
study, while Na�ve -Bayes is 63.60% and Decision tree is 80.84%.
Taxonomy Table
No |
Writer |
Research Title |
Method |
Results |
1. |
Adi
Kusuma, Agung Nugroho, 2021 |
Sentiment
Analysis on Twitter of the Increase in Basic Electricity Rates Using the
Na�ve Bayes Method |
Na�ve
Bayes |
This
study attempts to analyze sentiment to see public perception of the issue of
increasing basic electricity rates on Twitter social media using the Ba�ve Bayes method by classifying sentiments into
positive, negative and neutral. From the results of research that has been
done, it can be seen that the most negative sentiment is formed around 60% in
response to the issue of increasing the basic electricity rate. |
2 |
Rani Nooraeni, Aulia Fikri Fadhilah, Heny Dwi, Siti Fatimatul, Suciarti Pertiwi, Yulianus Ronaldias, 2020 |
Twitter
Data Sentiment Analysis Regarding the Issue of the KPK Bill Using the Support
Vector Machine (SVM) Method |
Support
Vector Machine (SVM) |
From
the original data classification model, training or testing, the percentage
of responses in the form of negative sentiment related to the KPK Bill issue
was 60.9 percent greater than the percentage of positive sentiment of 39.1
percent. The performance of the SVM model in classifying sentiment is quite
good because it has an accuracy, sensitivity and specificity value of 81.32
percent, 71.47 percent and 87.64 percent, respectively. |
3 |
Dianati Duei Putri 1 , Persistent Forda Nama2 , Wahyu Eko Sulistiono, 2022 |
ANALYSIS
OF THE PERFORMANCE SENTIMENT OF THE COUNCIL OF REPRESENTATIVES (DPR) ON
TWITTER USING THE NAIVE BAYES CLASSIFIER METHOD |
NAIVE
BAYES CLASSIFIER |
This
research uses 1546 data tweets. The results of this study found that the DPR
received 95 positive tweets with a polarity of 0.75 or 75% positive
sentiment, 693 neutral tweets with a polarity of 0.79 or 79% neutral
sentiment and 758 negative tweets with a polarity of 0.82 or 82% negative
sentiment with an accuracy score of 0.8 or 80%. based on testing data as much
as 20%. |
4 |
Amelia
Syahadati1 ) , Novert
Cyril Lengkong2) , Ouditiana Safitri3), Septriyan Machsus4) , Yongki Ramanda Putra5) , Rani Nooraeni |
SENTIMENT
ANALYSIS OF PSBB IMPLEMENTATION IN DKI JAKARTA AND ITS IMPACT ON JCI
MOVEMENTS |
JCI,
Twitter, Sentiment Analysis |
1)
conduct an analysis of public sentiment regarding PSBB DKI Jakarta volume II;
2) see the impact of this sentiment on the JCI movement; 3) compare the
results of several classification methods, namely logistic regression,
k-nearest neighbor, random forest, and na�ve Bayes. Scraping Twitter data for
the period September 8 - October 9 was carried out using Orange and RStudio
software. Furthermore, sentiment analysis with Orange classifies sentiment
into positive and negative groups. |
5 |
Puji Nurmawati1,
Endang Supriyati2, Tri Listyorini |
SENTIMENT
ANALYSIS OF KPOP FANS ON TWITTER SOCIAL MEDIA USING NAIVE BAYES (CASE STUDY
OF BTS GROUP FANS) |
NAIVE
BAYES |
From
the analysis carried out using the Na�ve Bayes classification algorithm,
there are negative sentiment polarities of 34.2%, 58.5% neutral, and 7.3%
positive. Of the 1000 data taken according to the polarity results of the
tweets, 342 were negative according to the polarity results. With an accuracy
rate of 75%. From this research it is hoped that it can assist in the process
of sentiment analysis and is appropriate in overcoming existing problems. |
6 |
Primandani Arsi* 1 , Retno
Waluyo |
SENTIMENT
ANALYSIS OF INDONESIAN CAPITAL REMOVAL DISCOURSE USING SUPPORT VECTOR MACHINE
(SVM) ALGORITHM |
SUPPORT
VECTOR MACHINE (SVM) ALGORITHM |
In
this study, it is proposed that the Support Vector Machine (SVM) method be
applied to tweets on the topic of moving the Indonesian capital city for the
purpose of classifying sentiment classes on Twitter social media. Technical
classification is done by classifying into 2 classes namely positive and
negative. Based on the results of tests carried out on tweets on the
sentiment of moving the capital city from social media Twitter, as many as
1,236 tweets (404 positive and 832 negative) using SVM obtained accuracy =
96.68%, precision = 95.82%, recall = 94.04% and AUC = 0.979. |
7 |
Angelina Puput Giovani1), Ardiansyah2), Tuti Haryanti3), Laela
Kurniawati4 ) , Windu Gata |
ANALYSIS
OF SENTIMENT OF GURU APPLICATION ON TWITTER USING CLASSIFICATION ALGORITHM |
CLASSIFICATION
ALGORITHM |
This
study compares the NB, SVM, K-NN methods without using feature selection with
the NB, SVM, K-NN methods that use feature selection and compares the Area
Under Curve (AUC) values of these methods to find out the most optimal
algorithm. The test results show that the best optimization application in
this model is the SVM-based PSO algorithm with an accuracy value of 78.55%
and an AUC of 0.853. This research succeeded in obtaining the best and most effective
algorithm for classifying positive comments and negative comments related to
the Ruang Guru application. |
8 |
Afif Nor
Yusuf 1 , Endang Supriyati 2 , Tri Listyorini |
Sentiment
Analysis Regarding Indihome Service Providers Based
on Customer Opinions Through Social Media Twitter
with the Na�ve Bayes Classifier Method |
Na�ve
Bayes Classifier |
The
results of the Na�ve Bayes method are very good. To test the level of
accuracy of the system in classifying opinions, so that the test obtains
classification results. The results of the classification obtain an average
yield of 74.5%. The more training data that is similar to the testing data,
the better the classification results will be. |
9 |
Yan Watequlis Syaifudin
1, Rizki Andi Irawan |
IMPLEMENTATION
OF CLUSTERING ANALYSIS AND TWITTER DATA SENTIMENT ON BEACH TOURISM OPINIONS
USING K-MEANS METHOD |
K-MEANS |
The
accuracy of the classification using the Support Vector Machine algorithm is
74.39%. Furthermore, opinion data from the questionnaire was added to
classify beaches based on the availability of resources, facilities, access,
community readiness, market potential and tourism position. In the process of
grouping this data, the K-Means method is used. |
10 |
Imam Kurniawan 1, Ajib Susanto |
Implementation
of the K-Means and Na�ve Bayes Classifier Methods for Sentiment Analysis for
the 2019 Presidential Election (Pilpres) |
K-Means
and Na�ve Bayes Classifier |
The
purpose of this study is to obtain an analysis of text documents to obtain positive
or negative sentiments. The method used is K-Means for clustering the
training data and the Naive Bayes classifier for classifying the testing
data. The results of this weighting are in the form of positive and negative
sentiments. The data was taken from Twitter regarding the 2019 presidential
election as many as 500 tweet data. From the test results of 100 and 150 test
data obtained an average accuracy of 93.35% and an error rate of 6.66%. |
11 |
Sigit Suryono, Em a Utami, Em ha Taufiq Luthfi |
SENTIMENT
CLASSIFICATION IN TW ITTER WITH NAIVE BAYES CLASSIFIER |
NAIVE
BAYES CLASSIFIER |
From
the results of the 3 trials, the accuracy rate in the first trial was 64.95%,
second 66.36% and third 66.79%. Other results obtained from the
classification process were positive sentiment 28% negative sentiment 20% and
neutral sentiment 52%. Based on the results of the sentiment class
percentage, neutral sentiment is the most common sentiment when it comes to
the topic of President Joko Widodo and his government. |
12 |
Tati Mardiana1; Hafiz Syahreva2; Tuslaela |
COMPARISON
OF CLASSIFICATION METHODS ON FRANCHISING BUSINESS SENTIMENT ANALYSIS BASED ON
TWITTER DATA |
Sentiment,
Python, Twitter, Comparison. |
The
test results with the confusion matrix obtained an accuracy value of 83% for
Neural Network, 52% for K-Nearest Neighbor, 83% for Support Vector Machine,
and 81% for Decision Tree. This research shows that the Support Vector
Machine and Neural Network methods are the best for classifying positive and
negative comments related to franchising. |
13 |
Dedi
Darwis 1, Eka Shintya Pratiwi
2, A. Ferico Octaviansyah
Pasaribu |
APPLICATION
OF SVM ALGORITHM FOR SENTIMENT ANALYSIS ON CORRUPTION ERADICATION COMMISSION
TWITTER DATA OF THE REPUBLIC OF INDONESIA |
SVM
ALGORITHM |
This research
produced 1890 data and 3846 terms/words from the preprocessing results and
then calculated the value of the appearance of the word for labeling which
resulted in positive, negative and neutral sentiments. Based on the test
results generated, the application of the SVM method produces an accuracy
value of 82% and produces sentiment with a greater negative label with a
total of 77%, 8% positive label and 25% neutral label. |
14 |
Fira
Fathonah1), Asti Herliana |
The
Application of Sentiment Analysis Text Mining Regarding the Covid - 19
Vaccine Using the Na�ve Bayes Method |
Na�ve
Bayes |
Na�ve
Bayes is considered to have good potential in classifying documents compared
to other classification methods in terms of accuracy and efficiency. Based on
the results of testing 100 training data which were then re-selected using
data crawling techniques into 34 data, it was found that sentiment analysis
from Twitter users for the COVID-19 vaccine This obtained an accuracy
percentage of 100% |
15 |
Ragil Dimas
Himawan # 1 , Eliyani |
Comparison
of the Accuracy of Tweet Sentiment Analysis for the Provincial Government of
DKI Jakarta during the Pandemic Period |
Support
Vector Machine, Na�ve Bayes, Random Forest Classifier |
The
data obtained is 14208 lines by querying tweets containing the word or
mentioning the username @dkijakarta, which will be grouped by sentiment
class, namely negative, neutral and positive using the TF-IDF Vectorizer for
word weighting and classification using several methods, namely random.
forest classifier with 75.81% accuracy, naive Bayes algorithm with 75.22%
accuracy, and support vector machine algorithm 77.58%. A sentiment analysis
process was carried out on tweets with the percentage of negative, neutral
and positive results, respectively, namely, 8.8%, 83.6%, 7.6%. |
CONCLUSION
The purpose of this study is to find out the
results of the accuracy comparison of the research methods used, namely
K-nearest neighbor, Na�ve Bayes and Decision Tree. Judging from Class Recall
and Class Precision, the method that provides the highest level of precision is
the decision tree which is equal to 86.88%. The Decision Tree and KNN
classification methods in this study were used quite well because they produced
an accuracy rate above 80%, but other methods can be used to obtain maximum
accuracy results for further research.
BLIBLIOGRAPHY
Cahyanti, D., Rahmayani, A., & Husniar, SA (2020).
Analysis of the performance of the Knn method on the dataset of patients with
breast cancer. Indonesian Journal of Data and Science , 1 (2),
39�43.
Fikri, MI, Sabrila, TS, & Azhar, Y. (2020). Comparison of
the Na�ve Bayes method and the support vector machine for Twitter sentiment
analysis. SMATIKA Jurnal: STIKI Informatika Jurnal , 10 (02),
71�76.
Hope, PN (2019). Implementation of Data Mining in Predicting
Sales Transactions Using the Apriori Algorithm (Case Study of PT. Arma Anugerah
Abadi Branch of Sei Rampah). MATICS: Journal of Computer Science and
Information Technology (Journal of Computer Science and Information Technology)
, 11 (2), 46�50.
Jati, NP (2021). S Integration of Kansei Engineering and
Kano Based on Natural Language Processing (Nlp) to Support the Development of
Service Products in Borobudur Temple Tourism .
Lestari, UI, Nadhiroh, AY, & Novia, C. (2021).
Application of the K-Nearest Neighbor Method for a Decision Support System for
the Identification of Diabetes Mellitus. JATISI (Journal of Informatics
Engineering and Information Systems) , 8 (4), 2071�2082.
Muningsih, E. (2022). Combination of K-Means and Decision
Tree Methods with Comparison of Criteria and Split Data. Teknoinfo Journal ,
16 (1), 113�118.
Pamuji, FY, & Ramadhan, VP (2021). Comparison of Random
Forest and Decision Tree Algorithms for Predicting Immunotherapy Success. Journal
of Information Technology and Management , 7 (1), 46�50.
Puspita, R., & Widodo, A. (2021). Comparison of the KNN,
Decision Tree, and Na�ve Bayes Methods on the Sentiment Analysis of BPJS
Service Users. J. Inform. Univ. Pamulang , 5 (4), 646.
Rahman, MA, Hidayat, N., & Supianto, AA (2018).
Comparison of K-Nearest Neighbor and Na�ve Bayes Data Mining Methods for Clean
Water Quality Classification (Case Study of PDAM Tirta Kencana, Jombang
Regency). Journal of Information Technology Development and Computer Science
, 2 (12), 6346�6353.
Riadi, I., & Kom, M. (2017). Analysis of Digital
Evidence of Cyberbullying in Social Networks Using the Na�ve Bayes Classifier
(NBC) .
Romadloni, NT, Santoso, I., & Budilaksono, S. (2019). Comparison
of the Naive Bayes, Knn and Decision Tree Methods on Sentiment Analysis of KRL
Commuter Line Transportation. IKRA-ITH Informatics: Journal of Computers and
Informatics , 3 (2), 1�9.
Sartika, D., & Sensuse, DI (2017). Comparison of the
Naive Bayes, Nearest Neighbor, and Decision Tree classification algorithms in
case studies of clothing pattern selection decision making. JATISI (Journal
of Informatics Engineering and Information Systems) , 3 (2),
151�161.
Siregar, RRA, Siregar, ZU, & Arianto, R. (2019).
Classification of Sentiment Analysis on Training Participants' Comments Using
the K-Nearest Neighbor Method. The Flash , 8 (1), 81�92.
Sukmana, RN, Abdurrahman, A., & Wicaksono, Y. (2020).
Implementation of K-Nearest Neighbor to Determine Sales Predictions: (Case
Study: Pt Maksiplus Utama Indonesia). Journal of Information and
Communication Technology , 9 (2), 31�37.
Wahyuningsih, S., & Utari, DR (2018). Comparison of the
K-Nearest Neighbor, Naive Bayes and Decision Tree Methods for Predicting
Creditworthiness. Information Systems National Conference (KNSI) 2018 .
Zulfauzi, Z., & Alamsyah, MN (2020). Application of the
Naive Bayes Algorithm for Predicting New Student Admissions Case Study at Bina
Insan University, Faculty of Computers. Journal of Information Technology
Mura , 12 (2), 156�165.
Copyright
holders: Sri Rahayu, Bayu Rimbi Asmoro, Ery Rinaldi (2023) |
First publication
right: |
This
article is licensed under: |