Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
868 views
in Technique[技术] by (71.8m points)

go - Why does adding concurrency slow down this golang code?

I've got a bit of Go code that I've been tinkering with to answer a little curiosity of mine related to a video game my brother-in-law plays.

Essentially, the code below simulates interactions with monsters in the game and how often he can expect them to drop items upon their defeat. The problem I'm having is that I would expect a piece of code like this to be perfect for parallelization, but when I add in concurrency the time it takes to do all of the simulations tends to slow down by 4-6 times the original without concurrency.

To give you a better understanding of how the code works, I have three main functions: The interaction function which is a simple interaction between the player and a monster. It returns 1 if the monster drops an item, and 0 otherwise. The simulation function runs several interactions and returns a slice of interaction results (i.e., 1's and 0's representing successful/unsuccessful interactions). Finally, there is the test function which runs a set of simulations and returns a slice of simulation results which are the total number of interactions that resulted in a dropped item. It's the last function which I am trying to run in parallel.

Now, I could understand why the code would slow down if I created a goroutine for each test that I want to run. Assuming I'm running 100 tests, the context switching between each of the goroutines across the 4 CPUs my MacBook Air has would kill the performance, but I'm only creating as many goroutines as I have processors and dividing the number of tests between the goroutines. I would expect this to actually speed up the code's performance since I am running each of my tests in parallel, but, of course, I'm getting a major slow down instead.

I'd love to figure out why this is happening, so any help would be greatly appreciated.

Below is the regular code without the go routines:

package main

import (
    "fmt"
    "math/rand"
    "time"
)

const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)

/**
 * Simulates a single interaction with a monster
 *
 * Returns 1 if the monster dropped an item and 0 otherwise
 */
func interaction() int {
    if rand.Float64() <= DROP_RATE {
        return 1
    }
    return 0
}

/**
 * Runs several interactions and retuns a slice representing the results
 */
func simulation(n int) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction()
    }
    return interactions
}

/**
 * Runs several simulations and returns the results
 */
func test(n int) []int {
    simulations := make([]int, n)
    for i := range simulations {
        successes := 0
        for _, v := range simulation(NUMBER_OF_INTERACTIONS) {
            successes += v
        }
        simulations[i] = successes
    }
    return simulations
}

func main() {
    rand.Seed(time.Now().UnixNano())
    fmt.Println("Successful interactions: ", test(NUMBER_OF_SIMULATIONS))
}

And, here is the concurrent code with the goroutines:

package main

import (
    "fmt"
    "math/rand"
    "time"
    "runtime"
)

const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)

/**
 * Simulates a single interaction with a monster
 *
 * Returns 1 if the monster dropped an item and 0 otherwise
 */
func interaction() int {
    if rand.Float64() <= DROP_RATE {
        return 1
    }
    return 0
}

/**
 * Runs several interactions and retuns a slice representing the results
 */
func simulation(n int) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction()
    }
    return interactions
}

/**
 * Runs several simulations and returns the results
 */
func test(n int, c chan []int) {
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS) {
            simulations[i] += v
        }
    }
    c <- simulations
}

func main() {
    rand.Seed(time.Now().UnixNano())

    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println("Number of CPUs: ", nCPU)

    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }

    // Concatentate the test results
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], <-c)
    }

    fmt.Println("Successful interactions: ", results)
}

UPDATE (01/12/13 18:05)

I've added a new version of the concurrent code below that creates a new Rand instance for each goroutine per "the system"'s suggestion below. I'm now seeing a very slight speed up compared to the serial version of the code (around a 15-20% reduction in overall time taken). I'd love to know why I don't see something closer to a 75% reduction in time since I'm spreading the workload over my MBA's 4 cores. Does anyone have any further suggestions that could help out?

package main

import (
    "fmt"
    "math/rand"
    "time"
    "runtime"
)

const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)

/**
 * Simulates a single interaction with a monster
 *
 * Returns 1 if the monster dropped an item and 0 otherwise
 */
func interaction(generator *rand.Rand) int {
    if generator.Float64() <= DROP_RATE {
        return 1
    }
    return 0
}

/**
 * Runs several interactions and retuns a slice representing the results
 */
func simulation(n int, generator *rand.Rand) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction(generator)
    }
    return interactions
}

/**
 * Runs several simulations and returns the results
 */
func test(n int, c chan []int) {
    source := rand.NewSource(time.Now().UnixNano())
    generator := rand.New(source)
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) {
            simulations[i] += v
        }
    }
    c <- simulations
}

func main() {
    rand.Seed(time.Now().UnixNano())

    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println("Number of CPUs: ", nCPU)

    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }

    // Concatentate the test results
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], <-c)
    }

    fmt.Println("Successful interactions: ", results)
}

UPDATE (01/13/13 17:58)

Thanks everyone for the help in figuring out my problem. I did finally get the answer I was looking for and so I thought I would just summarize here for anyone who has the same problem.

Essentially I had two main issues: first, even though my code was embarrassingly parallel, it was running slower when I split it up amongst the available processors, and second, the solution opened up another issue, which was my serial code was running twice as slow as the concurrent code running on single processor, which you would expect to be roughly the same . In both cases the issue was the random number generator function rand.Float64. Basically, this is a convenience function provided by the rand package. In that package, a global instance of the Rand struct is created and used by each of the convenience functions. This global Rand instance has a mutex lock associated with it. Since I was using this convenience function, I wasn't truly able to parallelize my code since each of the goroutines would have to line up for access to the global Rand instance. The solution (as "the system" suggests below) is to create a separate instance of the Rand struct for each goroutine. This solved the first problem but created the second one.

The second problem was that my non-parallel concurrent code (i.e., my concurrent code running with only a single processor) was running twice as fast as the sequential code. The reason for this was that, even though I was only running with a single processor and a single goroutine, that goroutine had its own instance of the Rand struct that I had created, and I had created it without the mutex lock. The sequential code was still using the rand.Float64 convenience function which made use of the global mutex protected Rand instance. The cost of acquiring that lock was causing the sequential code to run twice as slow.

So, the moral of the story is, whenever performance matters, make sure you create an instance of the Rand struct and call the function you need off of it rather than using the convenience functions provided by the package.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...