Software for a slightly more involved statistical check of PBN files
#1
Posted 2012-November-19, 15:21
Occasionally I compare our local clubs pre-dealt hands (in PBN file format) with the following connected statistics:
High Card Points: 10.0000000000 per hand
Balanced (4333, 4432, 5332): 0.4760415243 average per hand
Voids: 0.0511637922 average per hand
Singletons: 0.3202474398 average per hand
7+ card suit: 0.0403212924 average per hand
But is there any other software that can examine a PBN data file for more than HCP, balanced hands, voids, singletons, and 7+ card suits? I would like to check some other distributional statistics other than these most common ones.
Bud H
#2
Posted 2012-November-19, 15:51
BudH, on 2012-November-19, 15:21, said:
Occasionally I compare our local clubs pre-dealt hands (in PBN file format) with the following connected statistics:
High Card Points: 10.0000000000 per hand
Balanced (4333, 4432, 5332): 0.4760415243 average per hand
Voids: 0.0511637922 average per hand
Singletons: 0.3202474398 average per hand
7+ card suit: 0.0403212924 average per hand
But is there any other software that can examine a PBN data file for more than HCP, balanced hands, voids, singletons, and 7+ card suits? I would like to check some other distributional statistics other than these most common ones.
Bud H
http://xkcd.com/882/
Also you may wish to only compare one hand per deal (I assume you're not, since you average 10HCP/hand with such amazing precision). A 7+ bagger with north changes the likelihood of one being dealt to south.
Other than this, I'd recommend just writing your own software to do whatever analysis you wish on the PBNs.
"...we live off being battle-scarred veterans who manage to hate our opponents slightly more than we hate each other.” -- Hamman, re: Wolff
#3
Posted 2012-November-19, 18:09
wyman, on 2012-November-19, 15:51, said:
Also you may wish to only compare one hand per deal (I assume you're not, since you average 10HCP/hand with such amazing precision). A 7+ bagger with north changes the likelihood of one being dealt to south.
Other than this, I'd recommend just writing your own software to do whatever analysis you wish on the PBNs.
What a great link; I was trying to teach that to my students the other day.
Never tell the same lie twice. - Elim Garek on the real moral of "The boy who cried wolf"
#4
Posted 2012-November-19, 18:15
Carl
#5
Posted 2012-November-19, 18:48
I would be shocked if you were to find any software package that will do what you want out of the box.
Mass market software packages target large markets (and analyzing PBN bridge files is not a large market)
Your best bet is to get access to a statistical programming language and use this to custom code what you need.
RPy is probably your best bet. (MATLAB would work great, but its probably a lot more than you'd want to spend)
If you don't know how to program, you might be able to convince someone to throw together some code for you...
#6
Posted 2012-November-20, 03:36
wyman, on 2012-November-19, 15:51, said:
I was already thinking of this comic when reading Bud's previous thread, but couldn't be bothered to go look for the URL.
-- Bertrand Russell
#7
Posted 2012-November-22, 09:09
BudH, on 2012-November-19, 15:21, said:
Occasionally I compare our local clubs pre-dealt hands (in PBN file format) with the following connected statistics:
High Card Points: 10.0000000000 per hand
Balanced (4333, 4432, 5332): 0.4760415243 average per hand
Voids: 0.0511637922 average per hand
Singletons: 0.3202474398 average per hand
7+ card suit: 0.0403212924 average per hand
But is there any other software that can examine a PBN data file for more than HCP, balanced hands, voids, singletons, and 7+ card suits? I would like to check some other distributional statistics other than these most common ones.
Bud H
Before you go looking for a solution, I suggest you could do with a clearer definition of your problem. What do you have in mind by "some other distributional statistics"? Do you want to be able to interact with this analysis software, or just to be given more details? Etc ...
#8
Posted 2012-November-22, 10:46
Import the pbn file into excel. use the filter function to display only the lines that begin
[Deal " (include the space and the quote mark to avoid getting the [Dealer tag)
Copy all those lines and paste on a new worksheet, starting in cell A2
use find and replace and Delete [Deal "?"
Use the text to columns, fixed length, such that the four hands get a column, so that you ahve hands in column A, B, C, and D.
Now you can extract what you want. To extract HCP in each cell use the following equation (this if for cell A2).
=((LEN(A2)-LEN(SUBSTITUTE(A2,"A","")))*4)+((LEN(A2)-LEN(SUBSTITUTE(A2,"K","")))*3)+(LEN(A2)-LEN(SUBSTITUTE(A2,"Q","")))*2+LEN(A2)-LEN(SUBSTITUTE(A2,"J",""))
To extract spades for a cell, use the following equation, this finds the first "." in the string, and subtracts that period, giving you the length of spades. If there is a spade void, for instance, the first period is at position "1", so 1-1 = 0, or the void, etc....
=SEARCH(".",A2)-1
To find the number of hearts, you use the following, where in this example F2 is the number of spades...
=SEARCH(".",A2,(SEARCH(".",A2)-1)+2)-(SEARCH(".",A2)-1)-2
(note, that if you had the number of spades in a cell, say F2, that the equation for hearts could simplify to....
=SEARCH(".",A2,F2+2)-F2-2)
The minus "2" is to remove from the count the space taken up by the two periods (one after spades, one after hearts... you have to find the period to know where hearts end. The -F2 or the -(SEARCH(".",A2)-1) is to remove the count for the number of spades in the hand.
Here is the equation for diamonds, assuming the following, the number of spades is in cell F2, the number of hearts is in G3, and the hand being looked at is in A2.
=SEARCH(".",A2,F2+G2+3)-F2-G2-3
If you placed spades, hearts, and diamonds in columns F, G, and H, then clubs is easy,
=13-SUM(F2:H2)
You can use excel to extract anything else you like, including the nubmer of singleton ♦7's if you like.
The stats you want is then up to you.... you can recombine the suit lengths, for instance to get a format like 5=3=3=2, 4=4=4=1, etc. You can use pivot tables or plots, or histograms, etc to get whatever you like.
If you have any trouble with this, you can email me your PBN file, and I will get it all chopped up in the this format and email you an excel spreadsheet back, or you can tell me what you want to beat out of the data, and I will see if I can set up a spreadsheet to do it for you and email that to you.
#9
Posted 2012-November-22, 11:28
inquiry, on 2012-November-22, 10:46, said:
I don't think that Excel is a practical choice for the type of analysis that Bud wants to do.
Let's assume that we wanted to use a Chi Square test to test a given null hypothesis...
Most of these events are rare and as a rule of thumb I've never liked using a chi square test if the expect frequency is less than 5.
You're going to need a really big sample size before you expect to see 5 8+ card suits in a a set of hands.
The amount of manual manipulation necessary seems untenable.
#10
Posted 2012-November-23, 13:56
hrothgar, on 2012-November-22, 11:28, said:
Let's assume that we wanted to use a Chi Square test to test a given null hypothesis...
Most of these events are rare and as a rule of thumb I've never liked using a chi square test if the expect frequency is less than 5.
You're going to need a really big sample size before you expect to see 5 8+ card suits in a a set of hands.
The amount of manual manipulation necessary seems untenable.
Richard, I know you use to work for MATLAB, and I am aware of its power. But not everyone owns a copy, nor wants to go to the expense of purchasing it, much less learning the programming language to use it, and get the degree is stastics to know how to use it properly. Yet the majority of people who mess around on windows machines own a copy or can find a copy of excel.
Excel can handle a lot of pbn hands with breeze. As I test, I ran 20,000 PBN hands from ACBL tournaments (which don't use uploaded hands).
In those 20,000 hands there were 388 hands with an 8 card suit, 25 hands with a nine card suit, and 1 hand with a ten card suit. I could have just as easily looked at 100,000 hands if I had a PBN file with 100,000 hands in it.
There were 3,653 hands with at least one of the four players held at least one void
There were only 15 hands out of 20,000 where the distribition of all four hands was 4333.
You can beat a lot of data out... and you can summarize it in a lot of different ways, without much skill. For example, here is a breakdown of just North;s HCP over 20,000 hands, which averages 9.989 points per hand for North.... close enough for me without a stat package.
0 69 1 167 2 280 3 502 4 759 5 1028 6 1310 7 1667 8 1792 9 1862 10 1869 11 1740 12 1615 13 1366 14 1080 15 896 16 690 17 483 18 315 19 215 20 141 21 65 22 51 23 20 24 6 25 9 26 2 27 1
#11
Posted 2012-November-23, 14:35
inquiry, on 2012-November-23, 13:56, said:
Comment 1: If you are going to bother running this type of analysis, at least do it correctly. There is a reason that hypothesis testing was developed. If you insist on doing this in Excel, you find plenty of references to running chi square tests in Excel on the web.
Comment 2: There are a lot of issues with Excel's numerics. You probably won't get burned on a simple analysis but I would never recommend using Excel for anything you care about.
Comment 3: If you look at my original post, you'll see that I recommended using a combination of R and Python based on cost
#12
Posted 2012-November-23, 22:48
hrothgar, on 2012-November-23, 14:35, said:
Comment 2: There are a lot of issues with Excel's numerics. You probably won't get burned on a simple analysis but I would never recommend using Excel for anything you care about.
Comment 3: If you look at my original post, you'll see that I recommended using a combination of R and Python based on cost
Richard, I don't care to run chi-squared test on bridge data. I doubt that BudH (the op) does either. Did you read his post, which is key here? His level of interest does not appear to be statistical testing. From his other post on this type of thing, he seems interested in seeing if his dealing program at his club was giving normal hand patterns or not.
He is probably interested in just checking the hands for peace of mind... not make a federal case out of it. He problably wants something like the following, The theoretical data was copied from Richard Pavlicek;'s Against all odds webpage, the bbo hands were the first 20,000 deals in a custom BB0 acbl tournmanent datafile in a bridgebrowser database.
hand pattern theoretical bbo hands 4=3=3=3 10.53613032 10.5275 4=4=3=2 21.55117565 21.51 4=4=4=1 2.99321884 2.91375 5=3=3=2 15.51684646 15.38625 5=4=2=2 10.57966804 10.56125 5=4=3=1 12.93070539 13.075 5=4=4=0 1.243337056 1.2375 5=5=2=1 3.173900413 3.155 5=5=3=0 0.895202681 0.8975 6=3=2=2 5.642489623 5.60625 6=3=3=1 3.448188103 3.505 6=4=2=1 4.702074686 4.73 6=4=3=0 1.326226194 1.33 6=5=1=1 0.705311203 0.67625 6=5=2=0 0.651056495 0.67625 6=6=1=0 0.072339611 0.07125 7=2=2=2 0.512953602 0.54375 7=3=2=1 1.880829874 1.945 7=3=3=0 0.265245239 0.25625 7=4=1=1 0.391839557 0.40625 7=4=2=0 0.361698053 0.34875 7=5=1=0 0.108509416 0.09625 7=6=0=0 0.005564585 0.01125 8=2=2=1 0.192357601 0.225 8=3=1=1 0.117551867 0.12375 8=3=2=0 0.108509416 0.0975 8=4=1=0 0.045212257 0.05 8=5=0=0 0.003130079 0.005 9=2=1=1 0.017810889 0.01 9=2=2=0 0.00822041 0.00875 9=3=1=0 0.010047168 0.01125 9-4-0-0 0.000966074 0.00125 10-2-1-0 0.001096055 0 10-1-1-1 0.000395798 0 10-3-0-0 0.000154572 0 11-1-1-0 2.49103E-05 0 11-2-0-0 1.14971E-05 0 12-1-0-0 3.19363E-07 0 13-0-0-0 6.3E-10 0
Or he might be interested in probing chance of a hand with no card higher than a ten, out of the 80,000 individual hands in 20,000 deals, there were 275 (0.344%, theoretical 0.364%) or hands with no card higher than a nine (41 out of 80,000, for 0.0513%, theoretical number 0.0547%).
Or how many hands have a suit headed by AKQJ? (647, or 1.06%).. there were 11 of those dealst where AKQJ occurred in two suits. I didn't break down if this was in the same hand or different hands. Stuff like this is easy to extract, which I think is what he wants, and there is no reason to limit to the search to 20,000, that is just the number I used to see if he was interested. I offered to give him the excel sheet or to probe his data if he wants to send it to me (if he is uncomfortable with excel). Maybe you could step up and offer to do the MATLAB for him if my solution doesn't fit his needs. Somehow, I think this is the kind of thing he wants to do, however. So I set up excel to solve what he I thought he wanted.
#13
Posted 2012-November-24, 04:23
inquiry, on 2012-November-23, 22:48, said:
Richard, I don't care to run chi-squared test on bridge data. I doubt that BudH (the op) does either. Did you read his post, which is key here? His level of interest does not appear to be statistical testing. From his other post on this type of thing, he seems interested in seeing if his dealing program at his club was giving normal hand patterns or not.
Just to be clear, you want to determine whether your dealing program is generating "normal hand patterns" but not do "statistical testing"?
How, pray tell, are you going to determine what does / does not constitute an acceptable result?
The entire point behind statistical testing is to establish a standard methodology to answer this question. If you're just going to make a random assertion based on gut feel why bother to look at the data?
#14
Posted 2012-November-24, 10:30
Not all statistics are about testing dealer software. And let's face it: Yes, Mathlab will do a great job at the statistical testing, but you still need someone with a knowledge of statistics to interpret the results. (To most people, "chi squared test" sounds as alien as "Mitä kuuluu?".)
I could imagine that Bud wants to compare how he is scoring on different types of hands. Does he do well on distributional hands? Does he do well when his side has the majority of the HCPs?
Maybe in half a year he will get the idea that they don't do well on distributional hands where the opponents own the hand. He concludes that he should look into his preempting style. Ben's Excel code will be just fine for that.
Rik
The most exciting phrase to hear in science, the one that heralds the new discoveries, is not “Eureka!” (I found it!), but “That’s funny…” – Isaac Asimov
The only reason God did not put "Thou shalt mind thine own business" in the Ten Commandments was that He thought that it was too obvious to need stating. - Kenberg
#15
Posted 2012-November-24, 10:53
Trinidad, on 2012-November-24, 10:30, said:
Not all statistics are about testing dealer software. And let's face it: Yes, Mathlab will do a great job at the statistical testing, but you still need someone with a knowledge of statistics to interpret the results. (To most people, "chi squared test" sounds as alien as "Mitä kuuluu?".)
I could imagine that Bud wants to compare how he is scoring on different types of hands. Does he do well on distributional hands? Does he do well when his side has the majority of the HCPs?
Maybe in half a year he will get the idea that they don't do well on distributional hands where the opponents own the hand. He concludes that he should look into his preempting style. Ben's Excel code will be just fine for that.
Rik
Comment 1: Bud specifically stated in an earlier version of the same thread that his goal was to validate that the dealer program his club is using is generating fair deals
Comment 2: I never recommended using MATLAB and pointed out that you can use Excel for a chi square test
#16
Posted 2012-November-24, 12:40
First, you poo-poo excel.....
hrothgar, on 2012-November-22, 11:28, said:
Let's assume that we wanted to use a Chi Square test to test a given null hypothesis...
Most of these events are rare and as a rule of thumb I've never liked using a chi square test if the expect frequency is less than 5.
You're going to need a really big sample size before you expect to see 5 8+ card suits in a a set of hands.
The amount of manual manipulation necessary seems untenable.
Then you defend your first choice, (R and python, assuming budH knows anything about either), and continue to poo-poo excel ("I would never recommend Excel for anything you care about).
hrothgar, on 2012-November-23, 14:35, said:
Comment 2: There are a lot of issues with Excel's numerics. You probably won't get burned on a simple analysis but I would never recommend using Excel for anything you care about.
Comment 3: If you look at my original post, you'll see that I recommended using a combination of R and Python based on cost
Now you seem to be advocating you did mention you COULD use excel (but of course you would never recommend it, see above).
hrothgar, on 2012-November-24, 10:53, said:
Comment 2: I never recommended using MATLAB and pointed out that you can use Excel for a chi squaretest
For anyone interested, excel does have problem with Chi squared testing for some extreme chi squared distributions, which might have something to do with Richard reluctance to recommend it, even if it would allow people to probe the hand records just the way BudH was looking (did anyone see any chi squared or p-values in Bud's OP? He just wants more ways to probe his data, and with excel this is straight forward and relatively easy. Another example? Here is the number of hands with at least one singleton Ace=1611, and number of hands with at least one AK doubleton 697 (I didn't bother to write an equation to check for more than one singleton ace in one or more hands on a deal, just if at least one singleton Ace existed... same for AK).
Anyway, just for which Richard, here is the observed hand patterns (33 of the 39 possible ones) found in the 80,000 hands found in the 20,000 unique deals. Just looking at the numbers, it might appear that eight card suits are dealt too often (there were 401 combined 8 card suits, and only 373.4 were "expected"). However, the sample size is still fairly small relative to all possible deals, and if it matters, the chi squared test gave a p value of 0.738184, suggesting that the null hypothesis (there is no statistical difference between the observed distribution and the theoretical distribution) can not be rejected (p would have to be something like 0.05 or less). So the observed deviations appear to be due to random chance alone, rather than some problem with the dealing program. You might want to check a larger size, and of course, I didn't test budH's dealing program, this was random bbo hands (I guess you could say the bbo dealing program). Anyway, the data is below for people to throw darts at, or use their own statistical programs to detemine the chi squared value and p values.
And look... assuming Bud has excel, he doesn't need anything else.... which, after all, was my point.
PATTERN OBSERVED THEORETICAL 4=3=3=3 8422 8428.9 4=4=3=2 17208 17240.9 4=4=4=1 2331 2394.6 5=3=3=2 12309 12413.5 5=4=2=2 8449 8463.7 5=4=3=1 10460 10344.6 5=4=4=0 990 994.7 5=5=2=1 2524 2539.1 5=5=3=0 718 716.2 6=3=2=2 4485 4514 6=3=3=1 2804 2758.6 6=4=2=1 3784 3761.7 6=4=3=0 1064 1061 6=5=1=1 541 564.2 6=5=2=0 541 520.8 6=6=1=0 57 57.9 7=2=2=2 435 410.4 7=3=2=1 1556 1504.7 7=3=3=0 205 212.2 7=4=1=1 325 313.5 7=4=2=0 279 289.4 7=5=1=0 77 86.8 7=6=0=0 9 4.5 8=2=2=1 180 153.9 8=3=1=1 99 94 8=3=2=0 78 86.8 8=4=1=0 40 36.2 8=5=0=0 4 2.5 9=2=1=1 8 14.2 9=2=2=0 7 6.6 9=3=1=0 9 8 9=4=0=0 1 0.8 10=2=1=0 1 0.9 80000 79999.8
#17
Posted 2012-November-24, 13:09
#18
Posted 2012-November-24, 13:37
TimG, on 2012-November-24, 13:09, said:
well, these numbers even with stats are not going to help bud out, as these were not generated using "HIS" hands. And there is absolutely no reason to limit the test ot 20,000 deals. He should use 250,000 deals (giving a million unique hands)... or maybe even more. Still, the data presented above easily passes the Chi-Square goodness-of-fit test with unequal expected frequencies, he could even quote the p values (of course, not his deals).... The people at his club will not understand or accept that kind of validation either.
#19
Posted 2012-November-24, 14:22
Seems obvious to me, isn't obvious to everyone.
Rik
The most exciting phrase to hear in science, the one that heralds the new discoveries, is not “Eureka!” (I found it!), but “That’s funny…” – Isaac Asimov
The only reason God did not put "Thou shalt mind thine own business" in the Ten Commandments was that He thought that it was too obvious to need stating. - Kenberg
#20
Posted 2012-November-24, 16:14
inquiry, on 2012-November-24, 13:37, said:
They may not understand, but they're more likely to accept. But, it's not enough to say "they easily pass the Chi-Square goodness-of-fit test". He needs to show his work or give p values (and explain what those mean).
I think Bud wants to do two things: test the actual hands played for a month, then two months, etc which will obviously be a small sample; and test the hand generator on a larger sample.