Files
heimdal/lib/asn1/lex.l
Nicolas Williams db7763ca7b asn1: X.681/682/683 magic handling of open types
Status:

 - And it works!

 - We have an extensive test based on decoding a rich EK certficate.

   This test exercises all of:

    - decoding
    - encoding with and without decoded open types
    - copying of decoded values with decoded open types
    - freeing of decoded values with decoded open types

   Valgrind finds no memory errors.

 - Added a manual page for the compiler.

 - rfc2459.asn1 now has all three primary PKIX types that we care about
   defined as in RFC5912, with IOS constraints and parameterization:

    - `Extension`       (embeds open type in an `OCTET STRING`)
    - `OtherName`       (embeds open type in an        `ANY`-like type)
    - `SingleAttribute` (embeds open type in an        `ANY`-like type)
    - `AttributeSet`    (embeds open type in a  `SET OF ANY`-like type)

   All of these use OIDs as the open type type ID field, but integer
   open type type ID fields are also supported (and needed, for
   Kerberos).

   That will cover every typed hole pattern in all our ASN.1 modules.

   With this we'll be able to automatically and recursively decode
   through all subject DN attributes even when the subject DN is a
   directoryName SAN, and subjectDirectoryAttributes, and all
   extensions, and all SANs, and all authorization-data elements, and
   PA-data, and...

   We're not really using `SingleAttribute` and `AttributeSet` yet
   because various changes are needed in `lib/hx509` for that.

 - `asn1_compile` builds and recognizes the subset of X.681/682/683 that
   we need for, and now use in, rfc2459.asn1.  It builds the necessary
   AST, generates the correct C types, and generates templating for
   object sets and open types!

 - See READMEs for details.

 - Codegen backend not tested; I won't make it implement automatic open
   type handling, but it should at least not crash by substituting
   `heim_any` for open types not embedded in `OCTET STRING`.

 - We're _really_ starting to have problems with the ITU-T ASN.1
   grammar and our version of it...

   Type names have to start with upper-case, value names with
   lower-case, but it's not enough to disambiguate.

   The fact the we've allowed value and type names to violate their
   respective start-with case rules is causing us trouble now that we're
   adding grammar from X.681/682/683, and we're going to have to undo
   that.

   In preparation for that I'm capitalizing the `heim_any` and
   `heim_any_set` types, and doing some additional cleanup, which
   requires changes to other parts of Heimdal (all in this same commit
   for now).

   Problems we have because of this:

    - We cannot IMPORT values into modules because we have no idea if a
      symbol being imported refers to a value or a type because the only
      clue we would have is the symbol's name, so we assume IMPORTed
      symbols are for types.

      This means we can't import OIDs, for example, which is super
      annoying.

      One thing we might be able to do here is mark imported symbols as
      being of an undetermined-but-not-undefined type, then coerce the
      symbol's type the first time it's used in a context where its type
      is inferred as type, value, object, object set, or class.  (Though
      since we don't generate C symbols for objects or classes, we won't
      be able to import them, especially since we need to know them at
      compile time and cannot defer their handling to link- or
      run-time.)

    - The `NULL` type name, and the `NULL` value name now cause two
      reduce/reduce conflicts via the `FieldSetting` production.

    - Various shift/reduce conflicts involving `NULL` values in
      non-top-level contexts (in constraints, for example).

 - Currently I have a bug where to disambiguate the grammar I have a
   CLASS_IDENTIFIER token that is all caps, while TYPE_IDENTIFIER must
   start with a capital but not be all caps, but this breaks Kerberos
   since all its types are all capitalized -- oof!

   To fix this I made it so class names have to be all caps and
   start with an underscore (ick).

TBD:

 - Check all the XXX comments and address them
 - Apply this treatment to Kerberos!  Automatic handling of authz-data
   sounds useful :)
 - Apply this treatment to PKCS#10 (CSRs) and other ASN.1 modules too.
 - Replace various bits of code in `lib/hx509/` with uses of this
   feature.
 - Add JER.
 - Enhance `hxtool` and `asn1_print`.

Getting there!
2021-02-28 18:13:08 -06:00

311 lines
8.0 KiB
Plaintext

%{
/*
* Copyright (c) 1997 - 2017 Kungliga Tekniska Högskolan
* (Royal Institute of Technology, Stockholm, Sweden).
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. Neither the name of the Institute nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE INSTITUTE AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE INSTITUTE OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/* $Id$ */
#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#undef ECHO
#include "symbol.h"
#include "asn1parse.h"
#include "lex.h"
#include "gen_locl.h"
static unsigned lineno = 1;
#undef ECHO
static void unterminated(const char *, unsigned);
%}
/* This is for broken old lexes (solaris 10 and hpux) */
%e 2000
%p 5000
%a 5000
%n 1000
%o 10000
%%
ABSENT { return kw_ABSENT; }
ABSTRACT-SYNTAX { return kw_ABSTRACT_SYNTAX; }
ALL { return kw_ALL; }
APPLICATION { return kw_APPLICATION; }
AUTOMATIC { return kw_AUTOMATIC; }
BEGIN { return kw_BEGIN; }
BIT { return kw_BIT; }
BMPString { return kw_BMPString; }
BOOLEAN { return kw_BOOLEAN; }
BY { return kw_BY; }
CHARACTER { return kw_CHARACTER; }
CHOICE { return kw_CHOICE; }
CLASS { return kw_CLASS; }
COMPONENT { return kw_COMPONENT; }
COMPONENTS { return kw_COMPONENTS; }
CONSTRAINED { return kw_CONSTRAINED; }
CONTAINING { return kw_CONTAINING; }
DEFAULT { return kw_DEFAULT; }
DEFINITIONS { return kw_DEFINITIONS; }
EMBEDDED { return kw_EMBEDDED; }
ENCODED { return kw_ENCODED; }
END { return kw_END; }
ENUMERATED { return kw_ENUMERATED; }
EXCEPT { return kw_EXCEPT; }
EXPLICIT { return kw_EXPLICIT; }
EXPORTS { return kw_EXPORTS; }
EXTENSIBILITY { return kw_EXTENSIBILITY; }
EXTERNAL { return kw_EXTERNAL; }
FALSE { return kw_FALSE; }
FROM { return kw_FROM; }
GeneralString { return kw_GeneralString; }
GeneralizedTime { return kw_GeneralizedTime; }
GraphicString { return kw_GraphicString; }
IA5String { return kw_IA5String; }
IDENTIFIER { return kw_IDENTIFIER; }
IMPLICIT { return kw_IMPLICIT; }
IMPLIED { return kw_IMPLIED; }
IMPORTS { return kw_IMPORTS; }
INCLUDES { return kw_INCLUDES; }
INSTANCE { return kw_INSTANCE; }
INTEGER { return kw_INTEGER; }
INTERSECTION { return kw_INTERSECTION; }
ISO646String { return kw_ISO646String; }
MAX { return kw_MAX; }
MIN { return kw_MIN; }
MINUS-INFINITY { return kw_MINUS_INFINITY; }
NULL { return kw_NULL; }
NumericString { return kw_NumericString; }
OBJECT { return kw_OBJECT; }
OCTET { return kw_OCTET; }
OF { return kw_OF; }
OPTIONAL { return kw_OPTIONAL; }
ObjectDescriptor { return kw_ObjectDescriptor; }
PATTERN { return kw_PATTERN; }
PDV { return kw_PDV; }
PLUS-INFINITY { return kw_PLUS_INFINITY; }
PRESENT { return kw_PRESENT; }
PRIVATE { return kw_PRIVATE; }
PrintableString { return kw_PrintableString; }
REAL { return kw_REAL; }
RELATIVE_OID { return kw_RELATIVE_OID; }
SEQUENCE { return kw_SEQUENCE; }
SET { return kw_SET; }
SIZE { return kw_SIZE; }
STRING { return kw_STRING; }
SYNTAX { return kw_SYNTAX; }
T61String { return kw_T61String; }
TAGS { return kw_TAGS; }
TRUE { return kw_TRUE; }
TYPE-IDENTIFIER { return kw_TYPE_IDENTIFIER; }
TeletexString { return kw_TeletexString; }
UNION { return kw_UNION; }
UNIQUE { return kw_UNIQUE; }
UNIVERSAL { return kw_UNIVERSAL; }
UTCTime { return kw_UTCTime; }
UTF8String { return kw_UTF8String; }
UniversalString { return kw_UniversalString; }
VideotexString { return kw_VideotexString; }
VisibleString { return kw_VisibleString; }
WITH { return kw_WITH; }
[-,;{}()|] { return *yytext; }
"[" { return *yytext; }
"]" { return *yytext; }
"&" { return *yytext; }
"." { return *yytext; }
":" { return *yytext; }
"@" { return *yytext; }
::= { return EEQUAL; }
-- {
int c, start_lineno = lineno;
int f = 0;
while((c = input()) != EOF) {
if(f && c == '-')
break;
if(c == '-') {
f = 1;
continue;
}
if(c == '\n') {
lineno++;
break;
}
f = 0;
}
if(c == EOF)
unterminated("comment", start_lineno);
}
\/\* {
int c, start_lineno = lineno;
int level = 1;
int seen_star = 0;
int seen_slash = 0;
while((c = input()) != EOF) {
if(c == '/') {
if(seen_star) {
if(--level == 0)
break;
seen_star = 0;
continue;
}
seen_slash = 1;
continue;
}
if(seen_star && c == '/') {
if(--level == 0)
break;
seen_star = 0;
continue;
}
if(c == '*') {
if(seen_slash) {
level++;
seen_star = seen_slash = 0;
continue;
}
seen_star = 1;
continue;
}
seen_star = seen_slash = 0;
if(c == '\n') {
lineno++;
continue;
}
}
if(c == EOF)
unterminated("comment", start_lineno);
}
"\"" {
int start_lineno = lineno;
int c;
char buf[1024];
char *p = buf;
int f = 0;
int skip_ws = 0;
while((c = input()) != EOF) {
if(isspace(c) && skip_ws) {
if(c == '\n')
lineno++;
continue;
}
skip_ws = 0;
if(c == '"') {
if(f) {
*p++ = '"';
f = 0;
} else
f = 1;
continue;
}
if(f == 1) {
unput(c);
break;
}
if(c == '\n') {
lineno++;
while(p > buf && isspace((unsigned char)p[-1]))
p--;
skip_ws = 1;
continue;
}
*p++ = c;
}
if(c == EOF)
unterminated("string", start_lineno);
*p++ = '\0';
yylval.name = estrdup(buf);
return STRING;
}
-?0x[0-9A-Fa-f]+|-?[0-9]+ { char *e, *y = yytext;
yylval.constant = strtoll((const char *)yytext,
&e, 0);
if(e == y)
lex_error_message("malformed constant (%s)", yytext);
else
return NUMBER;
}
[_][-A-Z0-9]* {
yylval.name = estrdup ((const char *)yytext);
return CLASS_IDENTIFIER;
}
[A-Z][-A-Za-z0-9_]* {
yylval.name = estrdup ((const char *)yytext);
return TYPE_IDENTIFIER;
}
[a-z][-A-Za-z0-9_]* {
yylval.name = estrdup ((const char *)yytext);
return VALUE_IDENTIFIER;
}
[ \t] ;
\n { ++lineno; }
\.\.\. { return ELLIPSIS; }
\.\. { return RANGE; }
. { lex_error_message("Ignoring char(%c)\n", *yytext); }
%%
int
yywrap ()
{
return 1;
}
void
lex_error_message (const char *format, ...)
{
va_list args;
va_start (args, format);
fprintf (stderr, "%s:%d: ", get_filename(), lineno);
vfprintf (stderr, format, args);
va_end (args);
error_flag++;
}
static void
unterminated(const char *type, unsigned start_lineno)
{
lex_error_message("unterminated %s, possibly started on line %d\n", type, start_lineno);
}