fix memory-corruption in regcomp with backslash followed by high byte
authorRich Felker <dalias@aerifal.cx>
Fri, 20 Mar 2015 22:06:04 +0000 (18:06 -0400)
committerRich Felker <dalias@aerifal.cx>
Fri, 20 Mar 2015 22:06:04 +0000 (18:06 -0400)
the regex parser handles the (undefined) case of an unexpected byte
following a backslash as a literal. however, instead of correctly
decoding a character, it was treating the byte value itself as a
character. this was not only semantically unjustified, but turned out
to be dangerous on archs where plain char is signed: bytes in the
range 252-255 alias the internal codes -4 through -1 used for special
types of literal nodes in the AST.

src/regex/regcomp.c

index 4cdaa1eabb847724f88171bd07e495ef176be9bc..bce6bc1593e0fb291e4d2a6811d311d611139640 100644 (file)
@@ -847,7 +847,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
                        } else {
                                /* extension: accept unknown escaped char
                                   as a literal */
-                               node = tre_ast_new_literal(ctx->mem, *s, *s, ctx->position);
+                               goto parse_literal;
                        }
                        ctx->position++;
                }