I'm writing a simulator in C, based around the core idea of salb918's simulator in MATLAB, for the purpose of evaluating lineups. Currently I'm using PAs, BB, hits, 2B, 3B, HR, SO as the stats. It's somewhat computationally intensive; to run all 9! combinations takes about 2.5 days on my Powerbook G4 if you want to get within +/- 1 run per season with 95% confidence. (translates to +/- 0.006 runs per game with 95% confidence)
The simulator generates a random number, and if that number falls within the range assigned to walks (as determined by BB/PA) then a walk is assigned. Otherwise, it falls to the next case, i.e., within ((H-2B-3B-HR)/PA + BB/PA) then a single is assigned. So on and so forth.
My simulator clearly underestimates offense - I forgot to include hit by pitch (I'm making the correction for the next run, but right now I'm in the middle of a run) and it doesn't include sac fly, sac hits, steals, or speed (i.e. no one scores on a single unless they're standing on third).
What I am wondering, though, is how to take into account GIDP (GDP). That stat is reported at baseball-reference.com, but clearly GIDP is dependent on the baserunner situation - you have to have a runner on first. For example, in 2005, Ausmus hit into a double play 17 times. Originally I was thinking of just assigning this number like any other stat, but that would probably actually underestimate the percentage chance of hitting into a double play, as the actual stat represents the number of times that it actually happened, which requires a runner on first and less than 2 outs. In my simulator, there won't always be a runner on first. I suppose since I'm not taking into account speed, steals, sac fly or sac hit, it's probably ok - it'll help correct upward - but is there a stat out there for GIDP that only includes at bats where a double play can actually take place?