Cont’d->
2. Language Recognization
This, I suppose, is at least easier than genre analysis since there won’t be so many overlapping or blurred types. But for the last time, the results are pretty disappointing…This time I created a playlist with 30 songs, which contains 6 songs in English, Chinese, French ,Japanese and Korean, respectively. This quantified way could make it easier for me to evaluate the result in accurate numbers.
(Results before the Multi-thread implementation)

It’s pretty clear that Korean songs are being neglected… I was wondering if it’s because KPOP songs tend to have more different languages in them.
Here’s the success rate:
Chinese: 4/6, 66.7% correct.

Japanese: 7/6 – my fault; I added BIGBANG’s Japanese version song. Still 100% correct.

French: 3/6, 50%

English: 0/6. 0% NOOOOOOOOOOOOOOO

Unknown: basically all English songs are lying here…

After the second try (because each try costs money) after I fixed the multi-thread issue, the result for Korean songs improved a bit with one song showing up correctly. Yay!

Others didn’t change much.
Well, I asked ChatGPT itself to explain the issue, here’s the answer I received:
This discrepancy likely stems from the OpenAI language model’s response variability and the way the code parses results to identify languages. The current implementation uses simple keyword matching (e.g., checking if “English” or “Korean” appears in the OpenAI response), which can be unreliable if the model’s output is inconsistent or ambiguous. To improve accuracy, we could enhance the result parsing by employing regex-based patterns or more robust natural language processing (NLP) techniques to extract language details from the model’s response. Additionally, incorporating a validation step or fine-tuning the prompt to explicitly request unambiguous output could significantly reduce misclassifications.
Maybe I’ll try to fine-tune the prompt or use a different model to improve the result. Also as I discussed above, Korean songs might have a higher failure rate because they tend to use more different languages, especially for KPOP songs. As for English songs’ fiasco, it remains a mystery. Maybe I should use more pop songs instead of indie songs, since it might be harder to get a correct answer? Anyways, I think it might be hard to continue on the genre part(which is also my most interested part) without figure out how to balance the accuracy and speed of the prompt. Maybe I’ll start with experimenting the vocals for the next step, as that is also less complicated. If only I had time…
See ya!