Advent of Code 2023: Day 4, Part 2 (Multi-threaded Boogaloo)

This Advent of Code puzzle involved a rather convoluted game where, without getting into the details, you essentially recursively win more cards until the data set reaches a "collapse state" where no more cards are won. The task is to find the total number of cards you would win, including the originals. An input of 198 cards to start eventually results in a queue length of over 1 million cards at certain points in my queue-based solution.

My initial solution was so slow on my old laptop that I was unsure if it was working right. I got the right result from the smaller sample dataset, but when the queue grew to over 1 million cards running for over 5 minutes and still hadn't finished, I decided to try and optimize.

Through the use of worker threads (and, while implementing them, fixing some silly oversights), I was able to get the program to run in just 1.6 seconds. Sure, it now does it in 2.5 seconds on a single thread, but shhh. More cores, more cool.

const numWorkers = 4
let workers = new Set()

for (let i = 0; i < numWorkers; i++) {
  const worker = new Worker('./worker.js')
  workers.add(worker)

  worker.postMessage({ type: 'init', cardMap })

  worker.on('message', (message) => {
    if (message.type === 'ready') {
      distributeTask(worker)
    } else if (message.type === 'addCards') {
      queue.push(...message.cards)
      cardCount += message.cards.length
      distributeTask(worker)
    }
  })
}

As the workers would need to add cards back to the queue, I could not just chunk the set and run it in parallel directly. Instead, I used a master-worker pattern, where the main thread distributed tasks to the workers and collected batches of results to push back onto the queue.

I was expecting an improvement when moving to threads, but not that much. Obviously running the queue in parallel would net some gains, but they'd be incremental, not orders of magnitude...

Console results from the solution

4 threads was sweet spot for my 4C/8T laptop. HT did not improve performance. Maybe still helped main thread though?

And indeed, we end up seeing only a ~35% increase by moving from 1 to 4 threads. The core issue, I believe, was that the queue was originally being processed serially with Array.prototype.shift(), meaning that I was reindexing the whole array of 1 million+ cards every single loop cycle. This problem was addressed by processing in batches, grabbing 10,000 cards at a time from the main queue, a strategy which, as shown in the image above, would have been viable on its own without threads, but where's the fun in that?

function distributeTask(worker) {
  if (queue.length > 0 && queue.length < 100000) {
    const taskChunk = queue.splice(0, 50)
    worker.postMessage({ type: 'newTask', taskChunk })
  } else if (queue.length >= 100000) {
    const taskChunk = queue.splice(0, 10000)
    worker.postMessage({ type: 'newTask', taskChunk })
  } else {
    worker.terminate()
    workers.delete(worker)
    clearInterval(logCount)

    if (workers.size === 0) {
      console.log('Result:', cardCount)
      console.timeEnd('Time')
    }
  }
}

This reduced the number of reindex operations on the bloated main queue by a factor of 10,000. I tried splicing the end of the array to eliminate reindexing altogether, but it was actually much slower. Never can tell what V8 is really doing...

let cardMap
let queue = []
let outQueue = []
let workerId

parentPort.on('message', (message) => {
  if (message.type === 'init') {
    cardMap = message.cardMap
    parentPort.postMessage({ type: 'ready' })
  } else if (message.type === 'newTaskChunk') {
    queue.push(...message.taskChunk)
    while (queue.length > 0) {
      const cardNum = queue.pop()

      let winCount = getWinCount(cardNum)
      if (winCount === 0) continue

      for (i = 0; i < winCount; i++) {
        outQueue.push(cardNum + i + 1)
      }
    }
    parentPort.postMessage({ type: 'addCards', cards: outQueue })
    outQueue = []
  } else if (message.type === 'setId') {
    workerId = message.workerId
  }
})

While I did swap from .shift() to .pop() within the workers, these arrays were only 10,000 cards long and if there was an improvement it was not significant.

I'm happy with many of my initial design decisions. For example, parsing into a Map of cards keyed by their index allowed my queue to be a simple integer array. This worked great.

function parseInput(inputArr) {
  const cardKV = inputArr.map((line, i) => {
    const game = line.split(':')[1].trim()
    const [winningNumStr, playerNumStr] = game.split('|')
    const winningNums = new Set(
      winningNumStr.split(' ').filter((num) => num !== ''),
    )
    const playerNums = new Set(
      playerNumStr.split(' ').filter((num) => num !== ''),
    )
    return [i, { winningNums, playerNums }]
  })
  return new Map(cardKV)
}

I also memoized the number of cards a player would win from a given card the first time it appeared, yielding significant gains.

function getWinCount(cardNum) {
  const game = cardMap.get(cardNum)
  if (typeof game === 'number') return game
  const { winningNums, playerNums } = game
  const intersection = new Set(
    [...winningNums].filter((x) => playerNums.has(x)),
  )
  cardMap.set(cardNum, intersection.size)
  return intersection.size
}

I'd never done AoC before this year, but I'm pleased with the challenges so far as they've encouraged me to tug at the edges of my knowledge and try new things.

You can find my full solution to this puzzle (and others) here.