summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--academic/ent/README9
-rw-r--r--academic/ent/ent.1230
-rw-r--r--academic/ent/ent.SlackBuild61
-rw-r--r--academic/ent/ent.info8
-rw-r--r--academic/ent/slack-desc19
5 files changed, 327 insertions, 0 deletions
diff --git a/academic/ent/README b/academic/ent/README
new file mode 100644
index 0000000000..7a5602d720
--- /dev/null
+++ b/academic/ent/README
@@ -0,0 +1,9 @@
+ent - A pseudorandom number sequence test program
+
+ent applies various tests to sequences of bytes stored in files and
+reports the results of those tests. The program is useful for evaluating
+pseudorandom number generators for encryption and statistical sampling
+applications, compression algorithms, and other applications where the
+information density of a file is of interest. Tests performed include
+Entropy, Chi-square, Arithmetic Mean, Monte Carlo value for Pi, and
+Serial Correlation Coefficient.
diff --git a/academic/ent/ent.1 b/academic/ent/ent.1
new file mode 100644
index 0000000000..8480b8f616
--- /dev/null
+++ b/academic/ent/ent.1
@@ -0,0 +1,230 @@
+.TH ENT "1" "July 2007" "ent" "http://www.fourmilab.ch/random/"
+.SH NAME
+\fBent\fR \- pseudorandom number sequence test
+.PP
+This page describes a program, \fBent\fR, which applies various tests to
+sequences of bytes stored in files and reports the results of those tests.
+The program is useful for those evaluating pseudorandom number generators
+for encryption and statistical sampling applications, compression
+algorithms, and other applications where the information density of a file
+is of interest.
+.SH SYNOPSIS
+\fBent\fR [ \-bcftu ] [ \fIinfile\fR ]
+.SH DESCRIPTION
+\fBent\fR performs a variety of tests on the stream of bytes in \fIinfile\fR (or
+standard input if no \fIinfile\fR is specified) and produces output as follows
+on the standard output stream:
+.PP
+.nf
+Entropy = 7.980627 bits per character.
+
+Optimum compression would reduce the size
+of this 51768 character file by 0 percent.
+
+Chi square distribution for 51768 samples is 1542.26, and randomly
+would exceed this value 0.01 percent of the times.
+
+Arithmetic mean value of data bytes is 125.93 (127.5 = random).
+Monte Carlo value for Pi is 3.169834647 (error 0.90 percent).
+Serial correlation coefficient is 0.004249 (totally uncorrelated = 0.0).
+.fi
+.PP
+The values calculated are as follows:
+.PP
+Entropy
+.PP
+The information density of the contents of the file, expressed as
+a number of bits per character. The results above, which resulted
+from processing an image file compressed with JPEG, indicate that
+the file is extremely dense in information -- essentially random.
+Hence, compression of the file is unlikely to reduce its size. By
+contrast, the C source code of the program has entropy of about
+4.9 bits per character, indicating that optimal compression of the
+file would reduce its size by 38%. \fB[Hamming, pp. 104-108]\fR
+.PP
+Chi-square Test
+.PP
+The chi-square test is the most commonly used test for the
+randomness of data, and is extremely sensitive to errors in
+pseudorandom sequence generators. The chi-square distribution is
+calculated for the stream of bytes in the file and expressed as an
+absolute number and a percentage which indicates how frequently a
+truly random sequence would exceed the value calculated. We
+interpret the percentage as the degree to which the sequence
+tested is suspected of being non-random. If the percentage is
+greater than 99% or less than 1%, the sequence is almost certainly
+not random. If the percentage is between 99% and 95% or between 1%
+and 5%, the sequence is suspect. Percentages between 90% and 95%
+and 5% and 10% indicate the sequence is "almost suspect". Note
+that our JPEG file, while very dense in information, is far from
+random as revealed by the chi-square test.
+.PP
+Applying this test to the output of various pseudorandom sequence
+generators is interesting. The low-order 8 bits returned by the
+standard Unix rand() function, for example, yields:
+.PP
+.nf
+Chi square distribution for 500000 samples is 0.01, and randomly
+would exceed this value 99.99 percent of the times.
+.fi
+.PP
+While an improved generator \fB[Park & Miller]\fR reports:
+.PP
+.nf
+Chi square distribution for 500000 samples is 212.53, and
+randomly would exceed this value 95.00 percent of the times.
+.fi
+.PP
+Thus, the standard Unix generator (or at least the low-order bytes
+it returns) is unacceptably non-random, while the improved
+generator is much better but still sufficiently non-random to
+cause concern for demanding applications. Contrast both of these
+software generators with the chi-square result of a genuine random
+sequence created by timing radioactive decay events.
+.PP
+.nf
+Chi square distribution for 32768 samples is 237.05, and
+randomly would exceed this value 75.00 percent of the times.
+.fi
+.PP
+See \fB[Knuth, pp. 35-40]\fR for more information on the chi-square
+test. An interactive chi-square calculator is available at this
+site.
+.PP
+Arithmetic Mean
+.PP
+This is simply the result of summing the all the bytes (bits if
+the \fB\-b\fR option is specified) in the file and dividing by the file
+length. If the data are close to random, this should be about
+127.5 (0.5 for \fB\-b\fR option output). If the mean departs from this
+value, the values are consistently high or low.
+.PP
+Monte Carlo Value for Pi
+.PP
+Each successive sequence of six bytes is used as 24 bit X and Y
+co-ordinates within a square. If the distance of the
+randomly-generated point is less than the radius of a circle
+inscribed within the square, the six-byte sequence is considered a
+"hit". The percentage of hits can be used to calculate the value
+of Pi. For very large streams (this approximation converges very
+slowly), the value will approach the correct value of Pi if the
+sequence is close to random. A 32768 byte file created by
+radioactive decay yielded:
+.PP
+.nf
+Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).
+.fi
+.PP
+Serial Correlation Coefficient
+.PP
+This quantity measures the extent to which each byte in the file
+depends upon the previous byte. For random sequences, this value
+(which can be positive or negative) will, of course, be close to
+zero. A non-random byte stream such as a C program will yield a
+serial correlation coefficient on the order of 0.5. Wildly
+predictable data such as uncompressed bitmaps will exhibit serial
+correlation coefficients approaching 1. See \fB[Knuth, pp. 64-65]\fR for
+more details.
+.SH OPTIONS
+.IP \fB\-b\fR
+The input is treated as a stream of bits rather than of 8-bit
+bytes. Statistics reported reflect the properties of the
+bitstream.
+.IP \fB\-c\fR
+Print a table of the number of occurrences of each possible byte
+(or bit, if the \fB\-b\fR option is also specified) value, and the
+fraction of the overall file made up by that value. Printable
+characters in the ISO 8859-1 Latin1 character set are shown along
+with their decimal byte values. In non-terse output mode, values
+with zero occurrences are not printed.
+.IP \fB\-f\fR
+Fold upper case letters to lower case before computing statistics.
+Folding is done based on the ISO 8859-1 Latin1 character set, with
+accented letters correctly processed.
+.IP \fB\-t\fR
+Terse mode: output is written in Comma Separated Value (CSV)
+format, suitable for loading into a spreadsheet and easily read by
+any programming language. See Terse Mode Output Format below for
+additional details.
+.IP \fB\-u\fR
+Print how-to-call information.
+.SH FILES
+If no \fIinfile\fR is specified, \fBent\fR obtains its input from standard input.
+Output is always written to standard output.
+.SH TERSE MODE OUTPUT FORMAT
+Terse mode is selected by specifying the \fB\-t\fR option on the command line.
+Terse mode output is written in Comma Separated Value (CSV) format, which
+can be directly loaded into most spreadsheet programs and is easily read
+by any programming language. Each record in the CSV file begins with a
+record type field, which identifies the content of the following fields.
+If the \fB\-c\fR option is not specified, the terse mode output will consist of
+two records, as follows:
+.PP
+.nf
+0,File-bytes,Entropy,Chi-square,Mean,Monte-Carlo-Pi,Serial-Correlation
+1,file_length,entropy,chi_square,mean,Pi_value,correlation
+.fi
+.PP
+where the italicised values in the type 1 record are the numerical values
+for the quantities named in the type 0 column title record. If the \fB\-b\fR
+option is specified, the second field of the type 0 record will be
+"File-bits", and the file_length field in type 1 record will be given in
+bits instead of bytes. If the \fB\-c\fR option is specified, additional records
+are appended to the terse mode output which contain the character counts:
+.PP
+.nf
+2,Value,Occurrences,Fraction
+3,v,count,fraction
+. . .
+.fi
+.PP
+If the \fB\-b\fR option is specified, only two type 3 records will appear for the
+two bit values v=0 and v=1. Otherwise, 256 type 3 records are included,
+one for each possible byte value. The second field of a type 3 record
+indicates how many bytes (or bits) of value v appear in the input, and
+fraction gives the decimal fraction of the file which has value v (which
+is equal to the count value of this record divided by the file_length
+field in the type 1 record).
+.SH BUGS
+Note that the "optimal compression" shown for the file is computed from
+the byte- or bit-stream entropy and thus reflects compressibility based on
+a reading frame of the chosen width (8-bit bytes or individual bits if the
+\fB\-b\fR option is specified). Algorithms which use a larger reading frame, such
+as the Lempel-Ziv \fB[Lempel & Ziv]\fR algorithm, may achieve greater
+compression if the file contains repeated sequences of multiple bytes.
+.SH SEE ALSO
+\fIIntroduction to Probability and Statistics\fR
+.br
+http://www.fourmilab.ch/rpkp/experiments/statistics.html
+.PP
+\fB[Hamming]\fR
+.br
+Hamming, Richard W. \fICoding and Information Theory.\fR Englewood
+Cliffs NJ: Prentice-Hall, 1980.
+.PP
+\fB[Knuth]\fR
+.br
+Knuth, Donald E. \fIThe Art of Computer Programming, Volume 2 /
+Seminumerical Algorithms\fR. Reading MA: Addison-Wesley, 1969. ISBN
+0-201-89684-2.
+.PP
+\fB[Lempel & Ziv]\fR
+.br
+Ziv J. and A. Lempel. "A Universal Algorithm for Sequential Data
+Compression". \fIIEEE Transactions on Information Theory\fR \fB23\fR, 3,
+pp. 337-343.
+.PP
+\fB[Park & Miller]\fR
+.br
+Park, Stephen K. and Keith W. Miller. "Random Number Generators:
+Good Ones Are Hard to Find". \fICommunications of the ACM\fR, October
+1988, p. 1192.
+.SH COPYING
+This software is in the public domain. Permission to use, copy, modify,
+and distribute this software and its documentation for any purpose and
+without fee is hereby granted, without any conditions or restrictions.
+This software is provided "as is" without express or implied warranty.
+.SH AUTHOR
+John Walker
+.br
+October 20th, 1998
diff --git a/academic/ent/ent.SlackBuild b/academic/ent/ent.SlackBuild
new file mode 100644
index 0000000000..8e2c12dbb9
--- /dev/null
+++ b/academic/ent/ent.SlackBuild
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+# Slackware build script for ent
+
+# Written by B. Watson (yalhcru@gmail.com)
+
+PRGNAM=ent
+ARCHIVE=random.zip
+VERSION=${VERSION:-20080128}
+ARCH=${ARCH:-i486}
+BUILD=${BUILD:-1}
+TAG=${TAG:-_SBo}
+
+CWD=$(pwd)
+TMP=${TMP:-/tmp/SBo}
+PKG=$TMP/package-$PRGNAM
+OUTPUT=${OUTPUT:-/tmp}
+
+if [ "$ARCH" = "i486" ]; then
+ SLKCFLAGS="-O2 -march=i486 -mtune=i686"
+elif [ "$ARCH" = "i686" ]; then
+ SLKCFLAGS="-O2 -march=i686 -mtune=i686"
+elif [ "$ARCH" = "x86_64" ]; then
+ SLKCFLAGS="-O2 -fPIC"
+fi
+
+set -e
+
+rm -rf $PKG
+mkdir -p $TMP $PKG $OUTPUT
+cd $TMP
+rm -rf $PRGNAM
+mkdir -p $PRGNAM
+cd $PRGNAM
+unzip $CWD/$ARCHIVE
+chown -R root:root .
+find . \
+ \( -perm 777 -o -perm 775 -o -perm 711 -o -perm 555 -o -perm 511 \) \
+ -exec chmod 755 {} \; -o \
+ \( -perm 666 -o -perm 664 -o -perm 600 -o -perm 444 -o -perm 440 -o -perm 400 \) \
+ -exec chmod 644 {} \;
+
+make CC=gcc CFLAGS="$SLKCFLAGS"
+strip $PRGNAM
+
+mkdir -p $PKG/usr/bin $PKG/usr/man/man1 $PKG/usr/doc/$PRGNAM-$VERSION
+cp $PRGNAM $PKG/usr/bin
+
+# The man page comes with the SlackBuild, not the source
+# (because I converted the HTML page to troff, hope it's OK)
+# ## Sure, that's fine :) --rworkman
+gzip -9c $CWD/$PRGNAM.1 > $PKG/usr/man/man1/$PRGNAM.1.gz
+
+cp -a $PRGNAM.html entitle.gif $PKG/usr/doc/$PRGNAM-$VERSION
+cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild
+
+mkdir -p $PKG/install
+cat $CWD/slack-desc > $PKG/install/slack-desc
+
+cd $PKG
+/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.tgz
diff --git a/academic/ent/ent.info b/academic/ent/ent.info
new file mode 100644
index 0000000000..9508eea838
--- /dev/null
+++ b/academic/ent/ent.info
@@ -0,0 +1,8 @@
+PRGNAM="ent"
+VERSION="20080128"
+HOMEPAGE="http://www.fourmilab.ch/random/"
+DOWNLOAD="http://www.fourmilab.ch/random/random.zip"
+MD5SUM="8104a83af1ea5b280da96c92da18eae4"
+MAINTAINER="B. Watson"
+EMAIL="yalhcru@gmail.com"
+APPROVED="rworkman"
diff --git a/academic/ent/slack-desc b/academic/ent/slack-desc
new file mode 100644
index 0000000000..59d68635ed
--- /dev/null
+++ b/academic/ent/slack-desc
@@ -0,0 +1,19 @@
+# HOW TO EDIT THIS FILE:
+# The "handy ruler" below makes it easier to edit a package description. Line
+# up the first '|' above the ':' following the base package name, and the '|'
+# on the right side marks the last column you can put a character in. You must
+# make exactly 11 lines for the formatting to be correct. It's also
+# customary to leave one space after the ':'.
+
+ |-----handy-ruler------------------------------------------------------|
+ent: ent - A pseudorandom number sequence test program
+ent:
+ent: ent applies various tests to sequences of bytes stored in files
+ent: and reports the results of those tests. The program is useful
+ent: for evaluating pseudorandom number generators for encryption and
+ent: statistical sampling applications, compression algorithms, and other
+ent: applications where the information density of a file is of interest.
+ent: Tests performed include Entropy, Chi-square, Arithmetic Mean, Monte
+ent: Carlo value for Pi, and Serial Correlation Coefficient.
+ent:
+ent: