callbacks.
This issue was reported to OpenSSL by Robert Swiecki (Google), and
- independently by Hanno Böck.
+ independently by Hanno Böck.
(CVE-2015-1789)
- [Emilia Käsper]
+ [Emilia Käsper]
*) PKCS7 crash with missing EnvelopedContent
This issue was reported to OpenSSL by Michal Zalewski (Google).
(CVE-2015-1790)
- [Emilia Käsper]
+ [Emilia Käsper]
*) CMS verify infinite loop with unknown hash function
This issue was reported to OpenSSL by Michal Zalewski (Google).
(CVE-2015-0289)
- [Emilia Käsper]
+ [Emilia Käsper]
*) DoS via reachable assert in SSLv2 servers fix
servers that both support SSLv2 and enable export cipher suites by sending
a specially crafted SSLv2 CLIENT-MASTER-KEY message.
- This issue was discovered by Sean Burford (Google) and Emilia Käsper
+ This issue was discovered by Sean Burford (Google) and Emilia Käsper
(OpenSSL development team).
(CVE-2015-0293)
- [Emilia Käsper]
+ [Emilia Käsper]
*) Empty CKE with client auth and DHE fix
version does not match the session's version. Resuming with a different
version, while not strictly forbidden by the RFC, is of questionable
sanity and breaks all known clients.
- [David Benjamin, Emilia Käsper]
+ [David Benjamin, Emilia Käsper]
*) Tighten handling of the ChangeCipherSpec (CCS) message: reject
early CCS messages during renegotiation. (Note that because
renegotiation is encrypted, this early CCS was not exploitable.)
- [Emilia Käsper]
+ [Emilia Käsper]
*) Tighten client-side session ticket handling during renegotiation:
ensure that the client only accepts a session ticket if the server sends
Similarly, ensure that the client requires a session ticket if one
was advertised in the ServerHello. Previously, a TLS client would
ignore a missing NewSessionTicket message.
- [Emilia Käsper]
+ [Emilia Käsper]
Changes between 1.0.1i and 1.0.1j [15 Oct 2014]
with a null pointer dereference (read) by specifying an anonymous (EC)DH
ciphersuite and sending carefully crafted handshake messages.
- Thanks to Felix Gröbert (Google) for discovering and researching this
+ Thanks to Felix Gröbert (Google) for discovering and researching this
issue.
(CVE-2014-3510)
- [Emilia Käsper]
+ [Emilia Käsper]
*) By sending carefully crafted DTLS packets an attacker could cause openssl
to leak memory. This can be exploited through a Denial of Service attack.
properly negotiated with the client. This can be exploited through a
Denial of Service attack.
- Thanks to Joonas Kuorilehto and Riku Hietamäki (Codenomicon) for
+ Thanks to Joonas Kuorilehto and Riku Hietamäki (Codenomicon) for
discovering and researching this issue.
(CVE-2014-5139)
[Steve Henson]
Thanks to Ivan Fratric (Google) for discovering this issue.
(CVE-2014-3508)
- [Emilia Käsper, and Steve Henson]
+ [Emilia Käsper, and Steve Henson]
*) Fix ec_GFp_simple_points_make_affine (thus, EC_POINTs_mul etc.)
for corner cases. (Certain input points at infinity could lead to
client or server. This is potentially exploitable to run arbitrary
code on a vulnerable client or server.
- Thanks to Jüri Aedla for reporting this issue. (CVE-2014-0195)
- [Jüri Aedla, Steve Henson]
+ Thanks to Jüri Aedla for reporting this issue. (CVE-2014-0195)
+ [Jüri Aedla, Steve Henson]
*) Fix bug in TLS code where clients enable anonymous ECDH ciphersuites
are subject to a denial of service attack.
- Thanks to Felix Gröbert and Ivan Fratric at Google for discovering
+ Thanks to Felix Gröbert and Ivan Fratric at Google for discovering
this issue. (CVE-2014-3470)
- [Felix Gröbert, Ivan Fratric, Steve Henson]
+ [Felix Gröbert, Ivan Fratric, Steve Henson]
*) Harmonize version and its documentation. -f flag is used to display
compilation flags.
Thanks go to Nadhem Alfardan and Kenny Paterson of the Information
Security Group at Royal Holloway, University of London
(www.isg.rhul.ac.uk) for discovering this flaw and Adam Langley and
- Emilia Käsper for the initial patch.
+ Emilia Käsper for the initial patch.
(CVE-2013-0169)
- [Emilia Käsper, Adam Langley, Ben Laurie, Andy Polyakov, Steve Henson]
+ [Emilia Käsper, Adam Langley, Ben Laurie, Andy Polyakov, Steve Henson]
*) Fix flaw in AESNI handling of TLS 1.2 and 1.1 records for CBC mode
ciphersuites which can be exploited in a denial of service attack.
EC_GROUP_new_by_curve_name() will automatically use these (while
EC_GROUP_new_curve_GFp() currently prefers the more flexible
implementations).
- [Emilia Käsper, Adam Langley, Bodo Moeller (Google)]
+ [Emilia Käsper, Adam Langley, Bodo Moeller (Google)]
*) Use type ossl_ssize_t instad of ssize_t which isn't available on
all platforms. Move ssize_t definition from e_os.h to the public
[Adam Langley (Google)]
*) Fix spurious failures in ecdsatest.c.
- [Emilia Käsper (Google)]
+ [Emilia Käsper (Google)]
*) Fix the BIO_f_buffer() implementation (which was mixing different
interpretations of the '..._len' fields).
lock to call BN_BLINDING_invert_ex, and avoids one use of
BN_BLINDING_update for each BN_BLINDING structure (previously,
the last update always remained unused).
- [Emilia Käsper (Google)]
+ [Emilia Käsper (Google)]
*) In ssl3_clear, preserve s3->init_extra along with s3->rbuf.
[Bob Buckholz (Google)]
*) Add RFC 3161 compliant time stamp request creation, response generation
and response verification functionality.
- [Zoltán Glózik <zglozik@opentsa.org>, The OpenTSA Project]
+ [Zoltán Glózik <zglozik@opentsa.org>, The OpenTSA Project]
*) Add initial support for TLS extensions, specifically for the server_name
extension so far. The SSL_SESSION, SSL_CTX, and SSL data structures now
*) BN_CTX_get() should return zero-valued bignums, providing the same
initialised value as BN_new().
- [Geoff Thorpe, suggested by Ulf Möller]
+ [Geoff Thorpe, suggested by Ulf Möller]
*) Support for inhibitAnyPolicy certificate extension.
[Steve Henson]
some point, these tighter rules will become openssl's default to improve
maintainability, though the assert()s and other overheads will remain only
in debugging configurations. See bn.h for more details.
- [Geoff Thorpe, Nils Larsch, Ulf Möller]
+ [Geoff Thorpe, Nils Larsch, Ulf Möller]
*) BN_CTX_init() has been deprecated, as BN_CTX is an opaque structure
that can only be obtained through BN_CTX_new() (which implicitly
[Douglas Stebila (Sun Microsystems Laboratories)]
*) Add the possibility to load symbols globally with DSO.
- [Götz Babin-Ebell <babin-ebell@trustcenter.de> via Richard Levitte]
+ [Götz Babin-Ebell <babin-ebell@trustcenter.de> via Richard Levitte]
*) Add the functions ERR_set_mark() and ERR_pop_to_mark() for better
control of the error stack.
[Steve Henson]
*) Undo Cygwin change.
- [Ulf Möller]
+ [Ulf Möller]
*) Added support for proxy certificates according to RFC 3820.
Because they may be a security thread to unaware applications,
[Stephen Henson, reported by UK NISCC]
*) Use Windows randomness collection on Cygwin.
- [Ulf Möller]
+ [Ulf Möller]
*) Fix hang in EGD/PRNGD query when communication socket is closed
prematurely by EGD/PRNGD.
- [Darren Tucker <dtucker@zip.com.au> via Lutz Jänicke, resolves #1014]
+ [Darren Tucker <dtucker@zip.com.au> via Lutz Jänicke, resolves #1014]
*) Prompt for pass phrases when appropriate for PKCS12 input format.
[Steve Henson]
pointers passed to them whenever necessary. Otherwise it is possible
the caller may have overwritten (or deallocated) the original string
data when a later ENGINE operation tries to use the stored values.
- [Götz Babin-Ebell <babinebell@trustcenter.de>]
+ [Götz Babin-Ebell <babinebell@trustcenter.de>]
*) Improve diagnostics in file reading and command-line digests.
[Ben Laurie aided and abetted by Solar Designer <solar@openwall.com>]
[Bodo Moeller]
*) BN_sqr() bug fix.
- [Ulf Möller, reported by Jim Ellis <jim.ellis@cavium.com>]
+ [Ulf Möller, reported by Jim Ellis <jim.ellis@cavium.com>]
*) Rabin-Miller test analyses assume uniformly distributed witnesses,
so use BN_pseudo_rand_range() instead of using BN_pseudo_rand()
[Bodo Moeller]
*) Fix OAEP check.
- [Ulf Möller, Bodo Möller]
+ [Ulf Möller, Bodo Möller]
*) The countermeasure against Bleichbacher's attack on PKCS #1 v1.5
RSA encryption was accidentally removed in s3_srvr.c in OpenSSL 0.9.5
[Bodo Moeller]
*) Use better test patterns in bntest.
- [Ulf Möller]
+ [Ulf Möller]
*) rand_win.c fix for Borland C.
- [Ulf Möller]
+ [Ulf Möller]
*) BN_rshift bugfix for n == 0.
[Bodo Moeller]
*) New BIO_shutdown_wr macro, which invokes the BIO_C_SHUTDOWN_WR
BIO_ctrl (for BIO pairs).
- [Bodo Möller]
+ [Bodo Möller]
*) Add DSO method for VMS.
[Richard Levitte]
*) Bug fix: Montgomery multiplication could produce results with the
wrong sign.
- [Ulf Möller]
+ [Ulf Möller]
*) Add RPM specification openssl.spec and modify it to build three
packages. The default package contains applications, application
*) Don't set the two most significant bits to one when generating a
random number < q in the DSA library.
- [Ulf Möller]
+ [Ulf Möller]
*) New SSL API mode 'SSL_MODE_AUTO_RETRY'. This disables the default
behaviour that SSL_read may result in SSL_ERROR_WANT_READ (even if
*) Randomness polling function for Win9x, as described in:
Peter Gutmann, Software Generation of Practically Strong
Random Numbers.
- [Ulf Möller]
+ [Ulf Möller]
*) Fix so PRNG is seeded in req if using an already existing
DSA key.
[Steve Henson]
*) Eliminate non-ANSI declarations in crypto.h and stack.h.
- [Ulf Möller]
+ [Ulf Möller]
*) Fix for SSL server purpose checking. Server checking was
rejecting certificates which had extended key usage present
[Bodo Moeller]
*) Bugfix for linux-elf makefile.one.
- [Ulf Möller]
+ [Ulf Möller]
*) RSA_get_default_method() will now cause a default
RSA_METHOD to be chosen if one doesn't exist already.
[Steve Henson]
*) des_quad_cksum() byte order bug fix.
- [Ulf Möller, using the problem description in krb4-0.9.7, where
+ [Ulf Möller, using the problem description in krb4-0.9.7, where
the solution is attributed to Derrick J Brashear <shadow@DEMENTIA.ORG>]
*) Fix so V_ASN1_APP_CHOOSE works again: however its use is strongly
[Rolf Haberrecker <rolf@suse.de>]
*) Assembler module support for Mingw32.
- [Ulf Möller]
+ [Ulf Möller]
*) Shared library support for HPUX (in shlib/).
[Lutz Jaenicke <Lutz.Jaenicke@aet.TU-Cottbus.DE> and Anonymous]
*) BN_mul bugfix: In bn_mul_part_recursion() only the a>a[n] && b>b[n]
case was implemented. This caused BN_div_recp() to fail occasionally.
- [Ulf Möller]
+ [Ulf Möller]
*) Add an optional second argument to the set_label() in the perl
assembly language builder. If this argument exists and is set
[Steve Henson]
*) Fix potential buffer overrun problem in BIO_printf().
- [Ulf Möller, using public domain code by Patrick Powell; problem
+ [Ulf Möller, using public domain code by Patrick Powell; problem
pointed out by David Sacerdote <das33@cornell.edu>]
*) Support EGD <http://www.lothar.com/tech/crypto/>. New functions
RAND_egd() and RAND_status(). In the command line application,
the EGD socket can be specified like a seed file using RANDFILE
or -rand.
- [Ulf Möller]
+ [Ulf Möller]
*) Allow the string CERTIFICATE to be tolerated in PKCS#7 structures.
Some CAs (e.g. Verisign) distribute certificates in this form.
#define OPENSSL_ALGORITHM_DEFINES
#include <openssl/opensslconf.h>
defines all pertinent NO_<algo> symbols, such as NO_IDEA, NO_RSA, etc.
- [Richard Levitte, Ulf and Bodo Möller]
+ [Richard Levitte, Ulf and Bodo Möller]
*) Bugfix: Tolerate fragmentation and interleaving in the SSL 3/TLS
record layer.
*) Bug fix for BN_div_recp() for numerators with an even number of
bits.
- [Ulf Möller]
+ [Ulf Möller]
*) More tests in bntest.c, and changed test_bn output.
- [Ulf Möller]
+ [Ulf Möller]
*) ./config recognizes MacOS X now.
[Andy Polyakov]
*) Bug fix for BN_div() when the first words of num and divsor are
equal (it gave wrong results if (rem=(n1-q*d0)&BN_MASK2) < d0).
- [Ulf Möller]
+ [Ulf Möller]
*) Add support for various broken PKCS#8 formats, and command line
options to produce them.
*) New functions BN_CTX_start(), BN_CTX_get() and BT_CTX_end() to
get temporary BIGNUMs from a BN_CTX.
- [Ulf Möller]
+ [Ulf Möller]
*) Correct return values in BN_mod_exp_mont() and BN_mod_exp2_mont()
for p == 0.
- [Ulf Möller]
+ [Ulf Möller]
*) Change the SSLeay_add_all_*() functions to OpenSSL_add_all_*() and
include a #define from the old name to the new. The original intent
*) Source code cleanups: use const where appropriate, eliminate casts,
use void * instead of char * in lhash.
- [Ulf Möller]
+ [Ulf Möller]
*) Bugfix: ssl3_send_server_key_exchange was not restartable
(the state was not changed to SSL3_ST_SW_KEY_EXCH_B, and because of
[Steve Henson]
*) New function BN_pseudo_rand().
- [Ulf Möller]
+ [Ulf Möller]
*) Clean up BN_mod_mul_montgomery(): replace the broken (and unreadable)
bignum version of BN_from_montgomery() with the working code from
SSLeay 0.9.0 (the word based version is faster anyway), and clean up
the comments.
- [Ulf Möller]
+ [Ulf Möller]
*) Avoid a race condition in s2_clnt.c (function get_server_hello) that
made it impossible to use the same SSL_SESSION data structure in
*) The return value of RAND_load_file() no longer counts bytes obtained
by stat(). RAND_load_file(..., -1) is new and uses the complete file
to seed the PRNG (previously an explicit byte count was required).
- [Ulf Möller, Bodo Möller]
+ [Ulf Möller, Bodo Möller]
*) Clean up CRYPTO_EX_DATA functions, some of these didn't have prototypes
used (char *) instead of (void *) and had casts all over the place.
[Steve Henson]
*) Make BN_generate_prime() return NULL on error if ret!=NULL.
- [Ulf Möller]
+ [Ulf Möller]
*) Retain source code compatibility for BN_prime_checks macro:
BN_is_prime(..., BN_prime_checks, ...) now uses
BN_prime_checks_for_size to determine the appropriate number of
Rabin-Miller iterations.
- [Ulf Möller]
+ [Ulf Möller]
*) Diffie-Hellman uses "safe" primes: DH_check() return code renamed to
DH_CHECK_P_NOT_SAFE_PRIME.
(Check if this is true? OpenPGP calls them "strong".)
- [Ulf Möller]
+ [Ulf Möller]
*) Merge the functionality of "dh" and "gendh" programs into a new program
"dhparam". The old programs are retained for now but will handle DH keys
*) Add missing #ifndefs that caused missing symbols when building libssl
as a shared library without RSA. Use #ifndef NO_SSL2 instead of
NO_RSA in ssl/s2*.c.
- [Kris Kennaway <kris@hub.freebsd.org>, modified by Ulf Möller]
+ [Kris Kennaway <kris@hub.freebsd.org>, modified by Ulf Möller]
*) Precautions against using the PRNG uninitialized: RAND_bytes() now
has a return value which indicates the quality of the random data
guaranteed to be unique but not unpredictable. RAND_add is like
RAND_seed, but takes an extra argument for an entropy estimate
(RAND_seed always assumes full entropy).
- [Ulf Möller]
+ [Ulf Möller]
*) Do more iterations of Rabin-Miller probable prime test (specifically,
3 for 1024-bit primes, 6 for 512-bit primes, 12 for 256-bit primes
[Steve Henson]
*) Honor the no-xxx Configure options when creating .DEF files.
- [Ulf Möller]
+ [Ulf Möller]
*) Add PKCS#10 attributes to field table: challengePassword,
unstructuredName and unstructuredAddress. These are taken from
*) More DES library cleanups: remove references to srand/rand and
delete an unused file.
- [Ulf Möller]
+ [Ulf Möller]
*) Add support for the the free Netwide assembler (NASM) under Win32,
since not many people have MASM (ml) and it can be hard to obtain.
worked.
*) Fix problems with no-hmac etc.
- [Ulf Möller, pointed out by Brian Wellington <bwelling@tislabs.com>]
+ [Ulf Möller, pointed out by Brian Wellington <bwelling@tislabs.com>]
*) New functions RSA_get_default_method(), RSA_set_method() and
RSA_get_method(). These allows replacement of RSA_METHODs without having
[Ben Laurie]
*) DES library cleanups.
- [Ulf Möller]
+ [Ulf Möller]
*) Add support for PKCS#5 v2.0 PBE algorithms. This will permit PKCS#8 to be
used with any cipher unlike PKCS#5 v1.5 which can at most handle 64 bit
[Christian Forster <fo@hawo.stw.uni-erlangen.de>]
*) config now generates no-xxx options for missing ciphers.
- [Ulf Möller]
+ [Ulf Möller]
*) Support the EBCDIC character set (work in progress).
File ebcdic.c not yet included because it has a different license.
[Bodo Moeller]
*) Move openssl.cnf out of lib/.
- [Ulf Möller]
+ [Ulf Möller]
*) Fix various things to let OpenSSL even pass ``egcc -pipe -O2 -Wall
-Wshadow -Wpointer-arith -Wcast-align -Wmissing-prototypes
[Ben Laurie]
*) Support Borland C++ builder.
- [Janez Jere <jj@void.si>, modified by Ulf Möller]
+ [Janez Jere <jj@void.si>, modified by Ulf Möller]
*) Support Mingw32.
- [Ulf Möller]
+ [Ulf Möller]
*) SHA-1 cleanups and performance enhancements.
[Andy Polyakov <appro@fy.chalmers.se>]
[Andy Polyakov <appro@fy.chalmers.se>]
*) Accept any -xxx and +xxx compiler options in Configure.
- [Ulf Möller]
+ [Ulf Möller]
*) Update HPUX configuration.
[Anonymous]
[Bodo Moeller]
*) OAEP decoding bug fix.
- [Ulf Möller]
+ [Ulf Möller]
*) Support INSTALL_PREFIX for package builders, as proposed by
David Harris.
[Niels Poppe <niels@netbox.org>]
*) New Configure option no-<cipher> (rsa, idea, rc5, ...).
- [Ulf Möller]
+ [Ulf Möller]
*) Add the PKCS#12 API documentation to openssl.txt. Preliminary support for
extension adding in x509 utility.
[Steve Henson]
*) Remove NOPROTO sections and error code comments.
- [Ulf Möller]
+ [Ulf Möller]
*) Partial rewrite of the DEF file generator to now parse the ANSI
prototypes.
[Steve Henson]
*) New Configure options --prefix=DIR and --openssldir=DIR.
- [Ulf Möller]
+ [Ulf Möller]
*) Complete rewrite of the error code script(s). It is all now handled
by one script at the top level which handles error code gathering,
[Steve Henson]
*) Move the autogenerated header file parts to crypto/opensslconf.h.
- [Ulf Möller]
+ [Ulf Möller]
*) Fix new 56-bit DES export ciphersuites: they were using 7 bytes instead of
8 of keying material. Merlin has also confirmed interop with this fix
[Andy Polyakov <appro@fy.chalmers.se>]
*) Change functions to ANSI C.
- [Ulf Möller]
+ [Ulf Möller]
*) Fix typos in error codes.
- [Martin Kraemer <Martin.Kraemer@MchP.Siemens.De>, Ulf Möller]
+ [Martin Kraemer <Martin.Kraemer@MchP.Siemens.De>, Ulf Möller]
*) Remove defunct assembler files from Configure.
- [Ulf Möller]
+ [Ulf Möller]
*) SPARC v8 assembler BIGNUM implementation.
[Andy Polyakov <appro@fy.chalmers.se>]
[Steve Henson]
*) New Configure option "rsaref".
- [Ulf Möller]
+ [Ulf Möller]
*) Don't auto-generate pem.h.
[Bodo Moeller]
*) New functions DSA_do_sign and DSA_do_verify to provide access to
the raw DSA values prior to ASN.1 encoding.
- [Ulf Möller]
+ [Ulf Möller]
*) Tweaks to Configure
[Niels Poppe <niels@netbox.org>]
[Steve Henson]
*) New variables $(RANLIB) and $(PERL) in the Makefiles.
- [Ulf Möller]
+ [Ulf Möller]
*) New config option to avoid instructions that are illegal on the 80386.
The default code is faster, but requires at least a 486.
- [Ulf Möller]
+ [Ulf Möller]
*) Got rid of old SSL2_CLIENT_VERSION (inconsistently used) and
SSL2_SERVER_VERSION (not used at all) macros, which are now the
Hagino <itojun@kame.net>]
*) File was opened incorrectly in randfile.c.
- [Ulf Möller <ulf@fitug.de>]
+ [Ulf Möller <ulf@fitug.de>]
*) Beginning of support for GeneralizedTime. d2i, i2d, check and print
functions. Also ASN1_TIME suite which is a CHOICE of UTCTime or
[Steve Henson]
*) Correct Linux 1 recognition in config.
- [Ulf Möller <ulf@fitug.de>]
+ [Ulf Möller <ulf@fitug.de>]
*) Remove pointless MD5 hash when using DSA keys in ca.
[Anonymous <nobody@replay.com>]
*) Fix the RSA header declarations that hid a bug I fixed in 0.9.0b but
was already fixed by Eric for 0.9.1 it seems.
- [Ben Laurie - pointed out by Ulf Möller <ulf@fitug.de>]
+ [Ben Laurie - pointed out by Ulf Möller <ulf@fitug.de>]
*) Autodetect FreeBSD3.
[Ben Laurie]
# the undertaken effort was that it appeared that in tight IA-32
# register window little-endian flavor could achieve slightly higher
# Instruction Level Parallelism, and it indeed resulted in up to 15%
-# better performance on most recent µ-archs...
+# better performance on most recent µ-archs...
#
# Third version adds AES_cbc_encrypt implementation, which resulted in
# up to 40% performance imrovement of CBC benchmark results. 40% was
$speed_limit=512; # chunks smaller than $speed_limit are
# processed with compact routine in CBC mode
$small_footprint=1; # $small_footprint=1 code is ~5% slower [on
- # recent µ-archs], but ~5 times smaller!
+ # recent µ-archs], but ~5 times smaller!
# I favor compact code to minimize cache
# contention and in hope to "collect" 5% back
# in real-life applications...
# Performance is not actually extraordinary in comparison to pure
# x86 code. In particular encrypt performance is virtually the same.
# Decrypt performance on the other hand is 15-20% better on newer
-# µ-archs [but we're thankful for *any* improvement here], and ~50%
+# µ-archs [but we're thankful for *any* improvement here], and ~50%
# better on PIII:-) And additionally on the pros side this code
# eliminates redundant references to stack and thus relieves/
# minimizes the pressure on the memory bus.
# referred below, which improves ECDH and ECDSA verify benchmarks
# by 18-40%.
#
-# Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
+# Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
# Polynomial Multiplication on ARM Processors using the NEON Engine.
#
# http://conradoplg.cryptoland.net/files/2010/12/mocrysen13.pdf
################
# void bn_GF2m_mul_2x2(BN_ULONG *r,
# BN_ULONG a1,BN_ULONG a0,
-# BN_ULONG b1,BN_ULONG b0); # r[3..0]=a1a0·b1b0
+# BN_ULONG b1,BN_ULONG b0); # r[3..0]=a1a0·b1b0
{
$code.=<<___;
.global bn_GF2m_mul_2x2
mov $mask,#7<<2
sub sp,sp,#32 @ allocate tab[8]
- bl mul_1x1_ialu @ a1·b1
+ bl mul_1x1_ialu @ a1·b1
str $lo,[$ret,#8]
str $hi,[$ret,#12]
eor r2,r2,$a
eor $b,$b,r3
eor $a,$a,r2
- bl mul_1x1_ialu @ a0·b0
+ bl mul_1x1_ialu @ a0·b0
str $lo,[$ret]
str $hi,[$ret,#4]
eor $a,$a,r2
eor $b,$b,r3
- bl mul_1x1_ialu @ (a1+a0)·(b1+b0)
+ bl mul_1x1_ialu @ (a1+a0)·(b1+b0)
___
@r=map("r$_",(6..9));
$code.=<<___;
// I've estimated this routine to run in ~120 ticks, but in reality
// (i.e. according to ar.itc) it takes ~160 ticks. Are those extra
// cycles consumed for instructions fetch? Or did I misinterpret some
-// clause in Itanium µ-architecture manual? Comments are welcomed and
+// clause in Itanium µ-architecture manual? Comments are welcomed and
// highly appreciated.
//
// On Itanium 2 it takes ~190 ticks. This is because of stalls on
if ($SIZE_T==8) {
my @r=map("%r$_",(6..9));
$code.=<<___;
- bras $ra,_mul_1x1 # a1·b1
+ bras $ra,_mul_1x1 # a1·b1
stmg $lo,$hi,16($rp)
lg $a,`$stdframe+128+4*$SIZE_T`($sp)
lg $b,`$stdframe+128+6*$SIZE_T`($sp)
- bras $ra,_mul_1x1 # a0·b0
+ bras $ra,_mul_1x1 # a0·b0
stmg $lo,$hi,0($rp)
lg $a,`$stdframe+128+3*$SIZE_T`($sp)
lg $b,`$stdframe+128+5*$SIZE_T`($sp)
xg $a,`$stdframe+128+4*$SIZE_T`($sp)
xg $b,`$stdframe+128+6*$SIZE_T`($sp)
- bras $ra,_mul_1x1 # (a0+a1)·(b0+b1)
+ bras $ra,_mul_1x1 # (a0+a1)·(b0+b1)
lmg @r[0],@r[3],0($rp)
xgr $lo,$hi
# the time being... Except that it has three code paths: pure integer
# code suitable for any x86 CPU, MMX code suitable for PIII and later
# and PCLMULQDQ suitable for Westmere and later. Improvement varies
-# from one benchmark and µ-arch to another. Below are interval values
+# from one benchmark and µ-arch to another. Below are interval values
# for 163- and 571-bit ECDH benchmarks relative to compiler-generated
# code:
#
&push ("edi");
&mov ($a,&wparam(1));
&mov ($b,&wparam(3));
- &call ("_mul_1x1_mmx"); # a1·b1
+ &call ("_mul_1x1_mmx"); # a1·b1
&movq ("mm7",$R);
&mov ($a,&wparam(2));
&mov ($b,&wparam(4));
- &call ("_mul_1x1_mmx"); # a0·b0
+ &call ("_mul_1x1_mmx"); # a0·b0
&movq ("mm6",$R);
&mov ($a,&wparam(1));
&mov ($b,&wparam(3));
&xor ($a,&wparam(2));
&xor ($b,&wparam(4));
- &call ("_mul_1x1_mmx"); # (a0+a1)·(b0+b1)
+ &call ("_mul_1x1_mmx"); # (a0+a1)·(b0+b1)
&pxor ($R,"mm7");
&mov ($a,&wparam(0));
- &pxor ($R,"mm6"); # (a0+a1)·(b0+b1)-a1·b1-a0·b0
+ &pxor ($R,"mm6"); # (a0+a1)·(b0+b1)-a1·b1-a0·b0
&movq ($A,$R);
&psllq ($R,32);
&mov ($a,&wparam(1));
&mov ($b,&wparam(3));
- &call ("_mul_1x1_ialu"); # a1·b1
+ &call ("_mul_1x1_ialu"); # a1·b1
&mov (&DWP(8,"esp"),$lo);
&mov (&DWP(12,"esp"),$hi);
&mov ($a,&wparam(2));
&mov ($b,&wparam(4));
- &call ("_mul_1x1_ialu"); # a0·b0
+ &call ("_mul_1x1_ialu"); # a0·b0
&mov (&DWP(0,"esp"),$lo);
&mov (&DWP(4,"esp"),$hi);
&mov ($b,&wparam(3));
&xor ($a,&wparam(2));
&xor ($b,&wparam(4));
- &call ("_mul_1x1_ialu"); # (a0+a1)·(b0+b1)
+ &call ("_mul_1x1_ialu"); # (a0+a1)·(b0+b1)
&mov ("ebp",&wparam(0));
@r=("ebx","ecx","edi","esi");
# undef mul_add
/*-
- * "m"(a), "+m"(r) is the way to favor DirectPath µ-code;
+ * "m"(a), "+m"(r) is the way to favor DirectPath µ-code;
* "g"(0) let the compiler to decide where does it
* want to keep the value of zero;
*/
# in bn_gf2m.c. It's kind of low-hanging mechanical port from C for
# the time being... Except that it has two code paths: code suitable
# for any x86_64 CPU and PCLMULQDQ one suitable for Westmere and
-# later. Improvement varies from one benchmark and µ-arch to another.
+# later. Improvement varies from one benchmark and µ-arch to another.
# Vanilla code path is at most 20% faster than compiler-generated code
# [not very impressive], while PCLMULQDQ - whole 85%-160% better on
# 163- and 571-bit ECDH benchmarks on Intel CPUs. Keep in mind that
$code.=<<___;
movdqa %xmm0,%xmm4
movdqa %xmm1,%xmm5
- pclmulqdq \$0,%xmm1,%xmm0 # a1·b1
+ pclmulqdq \$0,%xmm1,%xmm0 # a1·b1
pxor %xmm2,%xmm4
pxor %xmm3,%xmm5
- pclmulqdq \$0,%xmm3,%xmm2 # a0·b0
- pclmulqdq \$0,%xmm5,%xmm4 # (a0+a1)·(b0+b1)
+ pclmulqdq \$0,%xmm3,%xmm2 # a0·b0
+ pclmulqdq \$0,%xmm5,%xmm4 # (a0+a1)·(b0+b1)
xorps %xmm0,%xmm4
- xorps %xmm2,%xmm4 # (a0+a1)·(b0+b1)-a0·b0-a1·b1
+ xorps %xmm2,%xmm4 # (a0+a1)·(b0+b1)-a0·b0-a1·b1
movdqa %xmm4,%xmm5
pslldq \$8,%xmm4
psrldq \$8,%xmm5
mov \$0xf,$mask
mov $a1,$a
mov $b1,$b
- call _mul_1x1 # a1·b1
+ call _mul_1x1 # a1·b1
mov $lo,16(%rsp)
mov $hi,24(%rsp)
mov 48(%rsp),$a
mov 64(%rsp),$b
- call _mul_1x1 # a0·b0
+ call _mul_1x1 # a0·b0
mov $lo,0(%rsp)
mov $hi,8(%rsp)
mov 56(%rsp),$b
xor 48(%rsp),$a
xor 64(%rsp),$b
- call _mul_1x1 # (a0+a1)·(b0+b1)
+ call _mul_1x1 # (a0+a1)·(b0+b1)
___
@r=("%rbx","%rcx","%rdi","%rsi");
$code.=<<___;
# processes one byte in 8.45 cycles, A9 - in 10.2, Snapdragon S4 -
# in 9.33.
#
-# Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
+# Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
# Polynomial Multiplication on ARM Processors using the NEON Engine.
#
# http://conradoplg.cryptoland.net/files/2010/12/mocrysen13.pdf
veor $IN,$Xl @ inp^=Xi
.Lgmult_neon:
___
- &clmul64x64 ($Xl,$Hlo,"$IN#lo"); # H.lo·Xi.lo
+ &clmul64x64 ($Xl,$Hlo,"$IN#lo"); # H.lo·Xi.lo
$code.=<<___;
veor $IN#lo,$IN#lo,$IN#hi @ Karatsuba pre-processing
___
- &clmul64x64 ($Xm,$Hhl,"$IN#lo"); # (H.lo+H.hi)·(Xi.lo+Xi.hi)
- &clmul64x64 ($Xh,$Hhi,"$IN#hi"); # H.hi·Xi.hi
+ &clmul64x64 ($Xm,$Hhl,"$IN#lo"); # (H.lo+H.hi)·(Xi.lo+Xi.hi)
+ &clmul64x64 ($Xh,$Hhi,"$IN#hi"); # H.hi·Xi.hi
$code.=<<___;
veor $Xm,$Xm,$Xl @ Karatsuba post-processing
veor $Xm,$Xm,$Xh
or $V,%lo(0xA0406080),$V
or %l0,%lo(0x20C0E000),%l0
sllx $V,32,$V
- or %l0,$V,$V ! (0xE0·i)&0xff=0xA040608020C0E000
+ or %l0,$V,$V ! (0xE0·i)&0xff=0xA040608020C0E000
stx $V,[%i0+16]
ret
mov 0xE1,%l7
sllx %l7,57,$xE1 ! 57 is not a typo
- ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
+ ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
xor $Hhi,$Hlo,$Hhl ! Karatsuba pre-processing
xmulx $Xlo,$Hlo,$C0
xmulx $Xhi,$Hhi,$Xhi
sll $C0,3,$sqr
- srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
+ srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
xor $C0,$sqr,$sqr
- sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
+ sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
xor $C0,$C1,$C1 ! Karatsuba post-processing
xor $Xlo,$C2,$C2
xor $Xhi,$C2,$C2
xor $Xhi,$C1,$C1
- xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
+ xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
xor $C0,$C2,$C2
xmulx $C1,$xE1,$C0
xor $C1,$C3,$C3
mov 0xE1,%l7
sllx %l7,57,$xE1 ! 57 is not a typo
- ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
+ ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
and $inp,7,$shl
andn $inp,7,$inp
xmulx $Xhi,$Hhi,$Xhi
sll $C0,3,$sqr
- srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
+ srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
xor $C0,$sqr,$sqr
- sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
+ sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
xor $C0,$C1,$C1 ! Karatsuba post-processing
xor $Xlo,$C2,$C2
xor $Xhi,$C2,$C2
xor $Xhi,$C1,$C1
- xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
+ xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
xor $C0,$C2,$C2
xmulx $C1,$xE1,$C0
xor $C1,$C3,$C3
# effective address calculation and finally merge of value to Z.hi.
# Reference to rem_4bit is scheduled so late that I had to >>4
# rem_4bit elements. This resulted in 20-45% procent improvement
-# on contemporary µ-archs.
+# on contemporary µ-archs.
{
my $cnt;
my $rem_4bit = "eax";
# experimental alternative. special thing about is that there
# no dependency between the two multiplications...
mov \$`0xE1<<1`,%eax
- mov \$0xA040608020C0E000,%r10 # ((7..0)·0xE0)&0xff
+ mov \$0xA040608020C0E000,%r10 # ((7..0)·0xE0)&0xff
mov \$0x07,%r11d
movq %rax,$T1
movq %r10,$T2
movq %r11,$T3 # borrow $T3
pand $Xi,$T3
- pshufb $T3,$T2 # ($Xi&7)·0xE0
+ pshufb $T3,$T2 # ($Xi&7)·0xE0
movq %rax,$T3
- pclmulqdq \$0x00,$Xi,$T1 # ·(0xE1<<1)
+ pclmulqdq \$0x00,$Xi,$T1 # ·(0xE1<<1)
pxor $Xi,$T2
pslldq \$15,$T2
paddd $T2,$T2 # <<(64+56+1)
je .Lskip4x
sub \$0x30,$len
- mov \$0xA040608020C0E000,%rax # ((7..0)·0xE0)&0xff
+ mov \$0xA040608020C0E000,%rax # ((7..0)·0xE0)&0xff
movdqu 0x30($Htbl),$Hkey3
movdqu 0x40($Htbl),$Hkey4
le?vperm $IN,$IN,$IN,$lemask
vxor $zero,$zero,$zero
- vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
- vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
- vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
+ vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
+ vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
+ vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
vpmsumd $t2,$Xl,$xC2 # 1st phase
.align 5
Loop:
subic $len,$len,16
- vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
+ vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
subfe. r0,r0,r0 # borrow?-1:0
- vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
+ vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
and r0,r0,$len
- vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
+ vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
add $inp,$inp,r0
vpmsumd $t2,$Xl,$xC2 # 1st phase
#endif
vext.8 $IN,$t1,$t1,#8
- vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
+ vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
veor $t1,$t1,$IN @ Karatsuba pre-processing
- vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
- vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
+ vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
+ vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
veor $t2,$Xl,$Xh
#endif
vext.8 $In,$t1,$t1,#8
veor $IN,$IN,$Xl @ I[i]^=Xi
- vpmull.p64 $Xln,$H,$In @ H·Ii+1
+ vpmull.p64 $Xln,$H,$In @ H·Ii+1
veor $t1,$t1,$In @ Karatsuba pre-processing
vpmull2.p64 $Xhn,$H,$In
b .Loop_mod2x_v8
.Loop_mod2x_v8:
vext.8 $t2,$IN,$IN,#8
subs $len,$len,#32 @ is there more data?
- vpmull.p64 $Xl,$H2,$IN @ H^2.lo·Xi.lo
+ vpmull.p64 $Xl,$H2,$IN @ H^2.lo·Xi.lo
cclr $inc,lo @ is it time to zero $inc?
vpmull.p64 $Xmn,$Hhl,$t1
veor $t2,$t2,$IN @ Karatsuba pre-processing
- vpmull2.p64 $Xh,$H2,$IN @ H^2.hi·Xi.hi
+ vpmull2.p64 $Xh,$H2,$IN @ H^2.hi·Xi.hi
veor $Xl,$Xl,$Xln @ accumulate
- vpmull2.p64 $Xm,$Hhl,$t2 @ (H^2.lo+H^2.hi)·(Xi.lo+Xi.hi)
+ vpmull2.p64 $Xm,$Hhl,$t2 @ (H^2.lo+H^2.hi)·(Xi.lo+Xi.hi)
vld1.64 {$t0},[$inp],$inc @ load [rotated] I[i+2]
veor $Xh,$Xh,$Xhn
vext.8 $In,$t1,$t1,#8
vext.8 $IN,$t0,$t0,#8
veor $Xl,$Xm,$t2
- vpmull.p64 $Xln,$H,$In @ H·Ii+1
+ vpmull.p64 $Xln,$H,$In @ H·Ii+1
veor $IN,$IN,$Xh @ accumulate $IN early
vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase of reduction
veor $IN,$IN,$Xl @ inp^=Xi
veor $t1,$t0,$t2 @ $t1 is rotated inp^Xi
- vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
+ vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
veor $t1,$t1,$IN @ Karatsuba pre-processing
- vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
- vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
+ vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
+ vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
veor $t2,$Xl,$Xh
# achieves respectful 432MBps on 2.8GHz processor now. For reference.
# If executed on Xeon, current RC4_CHAR code-path is 2.7x faster than
# RC4_INT code-path. While if executed on Opteron, it's only 25%
-# slower than the RC4_INT one [meaning that if CPU µ-arch detection
+# slower than the RC4_INT one [meaning that if CPU µ-arch detection
# is not implemented, then this final RC4_CHAR code-path should be
# preferred, as it provides better *all-round* performance].
# switch to AVX alone improves performance by as little as 4% in
# comparison to SSSE3 code path. But below result doesn't look like
# 4% improvement... Trouble is that Sandy Bridge decodes 'ro[rl]' as
-# pair of µ-ops, and it's the additional µ-ops, two per round, that
+# pair of µ-ops, and it's the additional µ-ops, two per round, that
# make it run slower than Core2 and Westmere. But 'sh[rl]d' is decoded
-# as single µ-op by Sandy Bridge and it's replacing 'ro[rl]' with
+# as single µ-op by Sandy Bridge and it's replacing 'ro[rl]' with
# equivalent 'sh[rl]d' that is responsible for the impressive 5.1
# cycles per processed byte. But 'sh[rl]d' is not something that used
# to be fast, nor does it appear to be fast in upcoming Bulldozer
# SHA256 block transform for x86. September 2007.
#
# Performance improvement over compiler generated code varies from
-# 10% to 40% [see below]. Not very impressive on some µ-archs, but
+# 10% to 40% [see below]. Not very impressive on some µ-archs, but
# it's 5 times smaller and optimizies amount of writes.
#
# May 2012.
#
# IALU code-path is optimized for elder Pentiums. On vanilla Pentium
# performance improvement over compiler generated code reaches ~60%,
-# while on PIII - ~35%. On newer µ-archs improvement varies from 15%
+# while on PIII - ~35%. On newer µ-archs improvement varies from 15%
# to 50%, but it's less important as they are expected to execute SSE2
# code-path, which is commonly ~2-3x faster [than compiler generated
# code]. SSE2 code-path is as fast as original sha512-sse2.pl, even
fmovs %f1,%f3
fmovs %f0,%f2
- add %fp,BIAS,%i0 ! return pointer to caller´s top of stack
+ add %fp,BIAS,%i0 ! return pointer to caller´s top of stack
ret
restore
# table]. I stick to value of 2 for two reasons: 1. smaller table
# minimizes cache trashing and thus mitigates the hazard of side-
# channel leakage similar to AES cache-timing one; 2. performance
-# gap among different µ-archs is smaller.
+# gap among different µ-archs is smaller.
#
# Performance table lists rounded amounts of CPU cycles spent by
# whirlpool_block_mmx routine on single 64 byte input block, i.e.
* Contributed to the OpenSSL Project 2004 by Richard Levitte
* (richard@levitte.org)
*/
-/* Copyright (c) 2004 Kungliga Tekniska Högskolan
+/* Copyright (c) 2004 Kungliga Tekniska Högskolan
* (Royal Institute of Technology, Stockholm, Sweden).
* All rights reserved.
*
* Contributed to the OpenSSL Project 2004 by Richard Levitte
* (richard@levitte.org)
*/
-/* Copyright (c) 2004 Kungliga Tekniska Högskolan
+/* Copyright (c) 2004 Kungliga Tekniska Högskolan
* (Royal Institute of Technology, Stockholm, Sweden).
* All rights reserved.
*
day, which means that future revisions will not be fully compatible to
the current version.
-Bodo Möller <bodo@openssl.org>
+Bodo Möller <bodo@openssl.org>
VALUE "ProductVersion", "$version\\0"
// Optional:
//VALUE "Comments", "\\0"
- VALUE "LegalCopyright", "Copyright © 1998-2006 The OpenSSL Project. Copyright © 1995-1998 Eric A. Young, Tim J. Hudson. All rights reserved.\\0"
+ VALUE "LegalCopyright", "Copyright © 1998-2006 The OpenSSL Project. Copyright © 1995-1998 Eric A. Young, Tim J. Hudson. All rights reserved.\\0"
//VALUE "LegalTrademarks", "\\0"
//VALUE "PrivateBuild", "\\0"
//VALUE "SpecialBuild", "\\0"