Built a Spotify Playlist Organizer Using OpenAI and Python – 2

So last time I made a scratch of a Spotify playlist analysis tool, but I have noticed some problems. This is a follow-up post about my attempts to adjust/improve the tool.

  1. The speed is tooooo slow 🙁
  2. The language categorization is not very accurate.

1. Speed:

I added time calculation to the code. When testing with a 30-song playlist, the result was 307.8 seconds(~5 min) :(((

I suppose such a slow speed might relate to the following reasons:

  • OpenAI API Latency: It takes time for Open AI to respond
  • Retry Latency: The current retry time is 2-5 sec: time.sleep(random.uniform(2, 5)
  • OpenAI Prompt too complicated?
  • Spotify Side Latency: It might also take time for Spotify side to retrieve the playlist
  • Network Latency; not much control over this part unfortunately
  • Single Thread: Might severely delay the process

I was only testing Language organization functionality, so complicated prompts or Spotify latency might not be the main cause in this case. Instead, I tried to implement multi-thread processing.

    with ThreadPoolExecutor(max_workers=5) as executor:
        future_to_track = {
            executor.submit(analyze_track_with_openai, api_key, track['name'], track['artist']): track
            for track in tracks
        }
        for future in as_completed(future_to_track):
            track = future_to_track[future]
            try:
                result = future.result()
            except Exception as e:
                print(f"Error analyzing track '{track['name']}': {e}")
                result = "unknown"

Ooops.

Well, it DID speed up. Only one song is actually analyzed and added to the corresponding playlist(the result is correct). I’m pretty new to multi-thread but I suspect the threads might have messed up and there was overwriting.

After a lot of investigation, I figured that I needed to use lock to keep it thread-safe. Multiple threads could overwrite each other’s results or fail to add tracks properly. Also the results from the OpenAI API were not properly linked back to the corresponding track. Each thread was analyzing a track, but when the result came back, it wasn’t being categorized correctly :((

Eventually I had to rewrite the whole process_playlist. I want to make sure that each track is properly analyzed and categorized by assigning a corresponding thread. Then the results are stored in categorized_tracks, and a lock is used to prevent multiple threads from modifying at the same time. Through this structure, if one thread gets an error, it won’t impact other tracks.

        with lock:
            if language_category not in categorized_tracks:
                categorized_tracks[language_category] = []
            categorized_tracks[language_category].append(track)

The result is much better now (at least by time elapsed):

The processing speed improved by ~77.6%. I was thinking maybe modifying the Open AI API call will make it even better, but currently I’m OK with the result now.

2. Language Recognization

Ok I wrote like the beginning of this part but since a lot of pics will be included, it soon became much longer than I expected. I’ll move it to the third part!

Leave a Reply

Your email address will not be published. Required fields are marked *