Numerical Recipes Forum  

Go Back   Numerical Recipes Forum > Numerical Recipes Third Edition Forum > Methods: All Chapters in NR3

Reply
 
Thread Tools Rate Thread Display Modes
  #1  
Old 09-13-2010, 12:59 AM
emilpohl emilpohl is offline
Registered User
 
Join Date: Sep 2009
Posts: 6
Chapter 6 - incgammabeta.h : binomialdist:: cdf (bug?)

Hello all,

I've translated to Java. So I'm not sure if it's a bug. But the binomialdist.cdf only 'worked' after I've replaced the line:

return 1. - betai( (Doub)k, n - k + 1., pe);

by

return betai(n - (Doub)k, k + 1., 1. - pe);

Is this difference related to k (since it's discrete), a bug or bad java translation?

regards,

Carlos
Reply With Quote
  #2  
Old 09-13-2010, 10:45 AM
davekw7x davekw7x is offline
Registered User
 
Join Date: Jan 2008
Posts: 452
Quote:
Originally Posted by emilpohl View Post
...So I'm not sure if it's a bug....
Hmmm...I hadn't tested that function by itself.

So, I wrote a little program for p = 1.0/6.0 (probability of a "one" turning up when you toss a fair six-sided die).
Here's the program
Code:
#include "../code/nr3.h"
#include "../code/gamma.h"
//
// uncomment the following to use the original version
#include "../code/incgammabeta.h"

//
// uncomment the following to use the modified version
//#include "./incgammabeta1.h"

int main()
{
    int n;
    double p;
    // Change #if 0 to #if 1 to let the user enter the number of trials and
    // the probability of a desired outcome.
#if 0
    cout << "Enter n and p: ";
    cin >> n >> p;
    if (!cin || n < 1 || p <= 0 || p >= 1) {
        cout << "Invalid input." << endl;
        exit(EXIT_FAILURE);
    }
#else
    // Probability for a given face of a six-sided die
    n = 20; p = 1.0/6.0;
#endif
    cout << "CDF for binomial distribution with n = " << n
         << " and p = " << p << endl << endl;
    Binomialdist bd(n, p);
    cout << fixed;
    cout << scientific;
    cout << setprecision(11);
    cout << " x        Pr(X <= x)" << endl;
    for (int k = 0; k < n; k++) {
        cout << setw(3) << k << " " << setw(20) << bd.cdf(k) << endl;
    }
    return 0;
}
And here's the output:
Code:
CDF for binomial distribution with n = 20 and p = 0.166667

 x        Pr(X <= x)

  0    0.00000000000e+00
  1    2.60840533046e-02
  2    1.30420266523e-01
  3    3.28659071638e-01
  4    5.66545637776e-01
  5    7.68749218993e-01
  6    8.98159510972e-01
  7    9.62864656961e-01
  8    9.88746715357e-01
  9    9.97158384336e-01
.
.
.
Now, for a seat-of-pants sanity check: X can't be less than zero, so the probability that X is less than or equal to zero is equal to the probability that X is equal to zero. This value is not zero as given by the cdf() function; it is (5/6) to the power 20. That's equal to 0.026084053...

Similarly for the other values: It seems to me that for the usual definition of CDF the function uses a value of argument that's off by 1.

I haven't tested to see how this affects any other functions that use this function, but here's a test of cdf() with a slight modification:

Instead of changing the return statement (which would require changing the conditionals that test special cases), I just inserted the following line at the beginning of Binomialdist::cdf() in incgammabeta.h:
Code:
                ++k;
Now I get the following (which is what I would expect):
Code:
CDF for binomial distribution with n = 20 and p = 0.166667

 x        Pr(X <= x)
  0    2.60840533046e-02
  1    1.30420266523e-01
  2    3.28659071638e-01
  3    5.66545637776e-01
  4    7.68749218993e-01
  5    8.98159510972e-01
  6    9.62864656961e-01
  7    9.88746715357e-01
  8    9.97158384336e-01
  9    9.99401496063e-01
.
.
.
Or am I missing something? (It wouldn't be the first time, but see Footnote.)


Regards,

Dave

Footnote:
I don't "do" java, but as a quick cross check, I ran the following in GNU octave:
Code:
# GNU Octave m-file to check binomial cdf
n = 20;
p = 1.0/6.0;
printf('CDF for binomial distribution with n = %d and p = %f\n\n', n, p);
printf(' x      Pr(X <= x)\n\n')
for x=0:n
    printf('%3d  %.11e\n', x, binocdf(x, n, p))
end
Results agree with my modified cdf() function:
Code:
CDF for binomial distribution with n = 20 and p = 0.166667

 x      Pr(X <= x)

  0  2.60840533046e-02
  1  1.30420266523e-01
  2  3.28659071638e-01
  3  5.66545637776e-01
  4  7.68749218993e-01
  5  8.98159510972e-01
  6  9.62864656961e-01
  7  9.88746715357e-01
  8  9.97158384336e-01
  9  9.99401496063e-01
.
.
.
Reply With Quote
  #3  
Old 09-13-2010, 11:45 AM
emilpohl emilpohl is offline
Registered User
 
Join Date: Sep 2009
Posts: 6
Hi! Dave

thank you for your fast reply.

adding k++ makes cdf looks "right". I've read Poisson explanation 6.14.13. So, it might be the same case.

However, for x = 1; k>20 and pe > 0.9 the cdf goes to 0.0 below 1.0E-16 adding ++k, k++ or not.

could you test this, please?

regards,

Carlos
Reply With Quote
  #4  
Old 09-13-2010, 02:19 PM
davekw7x davekw7x is offline
Registered User
 
Join Date: Jan 2008
Posts: 452
Quote:
Originally Posted by emilpohl View Post
...
However, for x = 1; k>20 and pe > 0.9 the cdf goes to 0.0...
Welcome to the world of finite precision!

Let's look at it for n = 25 and pe = 0.95. With my modified incgammabeta.h, cdf(0) returns zero. Not nearly zero. Exactly zero.

My little sanity check says that for k = 0, the result should be 0.05 to the power 25. That's something like 2.98...e-33. That's pretty small, but it is not beyond representation as a double precision floating point number.

So, what happened?

Well...

For k = 0, the cdf function calls betai with the values:
a = 1, b = 25, and c = 0.95

Inside betai, you can confirm that for these values it is doing the last exit with a calculated return value equal to 1.0-bt*betacf(b,a,1.0-x)/b This turns out to be something like 1.0 - 2.9802322387696186e-33

Since there are only 16 or 17 significant decimal digits in a C++ double precision number, the returned value is 1.0

Then the cdf function returns 1.0 - the betai() value, so it becomes zero.

See what happened? 1.0 - (1.0 - very_small_number) should be equal to the very_small_number, but because of roundoff error it becomes 0.0. Associative and distributive "laws" don't always hold for finite-precision floating point numbers. Bummer!

Bottom line: perhaps the cdf function can be rewritten so that it calculates the smaller terms directly instead of using the the betai stuff, but for now, at least there is my explanation of why everything smaller than approximately 1.0e-16 becomes zero. Understanding the results can give insight.

Regards,

Dave

Footnote:
My little GNU octave program gives pretty much the same results (everything smaller than 1.0e-16 becomes zero), so they apparently use some similar calculations.

On the other hand...

When I used the GNU Scientific Library function gsl_cdf_binomial_P(), I got more precision.

Here's the C program:
Code:
#include <stdio.h>
#include <gsl/gsl_cdf.h>
int main()
{
    int n = 25;
    double p = 0.95;
    int k;
    printf("CDF of binomial distribution with n = %d, and p = %f\n\n", n, p);
    printf(" x         Pr(X <= x)\n\n");
    for (k = 0; k <= n; k++) {
        printf("%3d %18.6e\n", k, gsl_cdf_binomial_P(k, p, n));
    }
    return 0;
}
Here's the output:
Code:
CDF of binomial distribution with n = 25, and p = 0.950000

 x         Pr(X <= x)

  0       2.980232e-33
  1       1.418591e-30
  2       3.241777e-28
  3       4.733943e-26
  4       4.960433e-24
  5       3.970253e-22
  6       2.522780e-20
  7       1.305786e-18
  8       5.604966e-17
  9       2.020747e-15
 10       6.174753e-14
 11       1.609214e-12
 12       3.591139e-11
 13       6.876528e-10
 14       1.130173e-08
 15       1.591912e-07
 16       1.915378e-06
 17       1.958055e-05
 18       1.687532e-04
 19       1.212961e-03
 20       7.164948e-03
 21       3.409060e-02
 22       1.271065e-01
 23       3.576241e-01
 24       7.226104e-01
 25       1.000000e+00
In Other Words:

"There's more than one way to skin a cat."
My dear old granny used to say that,
but I don't know where she got it. I mean,
no one in my family ever actually skinned a cat.
(At least as far as I know, they didn't. I mean
she said that we were eating squirrel. But
why would she mention something about cats?
Oh, crap!)
---davekw7x
Reply With Quote
  #5  
Old 09-13-2010, 04:31 PM
emilpohl emilpohl is offline
Registered User
 
Join Date: Sep 2009
Posts: 6
Dave,

Now, after your explanation, it's clear to me. I've gotten the same GNU Scientific Library results using Colt-1.2.0, SSJ-2.4 java libraries.

This doubt has arisen when I was trying to find a good invcdf function, but for cdf=1.418591e-30, n=25 and p 0.95 the result is 8.0 instead 1., for example.

Thank you very much,

Carlos
Reply With Quote
Reply

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 09:32 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.