Utilization of Query Expansion Using Data Mining Method In Analyzing Documents on The Irama Nusantara Website
DOI:
https://doi.org/10.57185/jetbis.v3i11.156Keywords:
Machine Learning;, Natural Language Processing;, Big Data;, Data MiningAbstract
In Indonesia, many local websites, such as Irama Nusantara, hold valuable
information related to music and culture. Although rich in data, the
utilization of this information is still limited. This research aims to utilize
query expansion techniques through data mining methods in analyzing data
from the Irama Nusantara website. Data was collected from the Irama
Nusantara website through a crawling process, resulting in 5404 entries
covering audio, images and text. The analysis was conducted using Natural
Language Processing (NLP) techniques starting with the preprocessing
stage. Next, the K-Means algorithm was applied for clustering, and the
Term Frequency-Inverse Document Frequency (TF-IDF) method was used
for term weighting. Classification models were built using Support Vector
Machine (SVM) and Naive Bayes for comparison. The analysis shows that
the use of query expansion significantly improves the accuracy of
information retrieval on the Irama Nusantara website. The method
evaluation showed that SVM gave better results in terms of accuracy and
precision compared to Naive Bayes. In addition, Principal Component
Analysis (PCA) shows that 70-95% of the variance in the data can be
explained by the resulting principal components, which signifies the
efficiency of the applied method. This research not only provides a deeper
insight into the patterns and trends in the analyzed data, but also
contributes to the development of information technology in the field of
culture in Indonesia. This research successfully developed an effective
analysis model to utilize data from the Irama Nusantara website.






