asn1: Add README-X681.md (futures)

2021-01-26 22:33:47 -06:00
parent cb1ccf50fd
commit a8205cacb8
1 changed files with 353 additions and 0 deletions
--- a/lib/asn1/README-X681.md
+++ b/lib/asn1/README-X681.md
@@ -0,0 +1,353 @@
+Bringing the power of X.682 (ASN.1 Information Object System) to Heimdal
+========================================================================
+
+X.681 is an ITU-T standard in the X.680 series (ASN.1) that is incredibly
+useful and would be fantastic to implement in Heimdal.
+
+This README will cover some ideas for implementation and why we should want
+this.  This is also covered extensively in RFC 6025, in section 2.1.3.
+
+RFC 6025 does an excellent job of elucidating X.681, which otherwise most
+readers unfamiliar with it will no doubt find inscrutable.
+
+https://www.itu.int/rec/T-REC-X.681-201508-I/en
+
+
+Introduction
+============
+
+The reader should already be familiar with ASN.1, which anyways is a set of two
+things:
+
+ - an abstract syntax for specifying schemas for data interchange
+
+ - a set of encoding rules
+
+A very common thing to see in projects that use ASN.1, as well as projects that
+use alternatives to ASN.1, is a pattern known as the "typed hole" or "open
+type".
+
+The ASN.1 Information Object System (X.681) is all about automating the
+otherwise very annoying task of dealing with "typed holes" / "open types".
+
+
+Typed Holes / Open Types
+========================
+
+A typed hole or open type is a data structure with a form like:
+
+```
+    { type_id, bytes_encoding_a_value_of_a_type_identified_by_type_id }
+```
+
+I.e., an opaque datum and an identifier of what kind of datum that is.  This
+happens because the structure with the typed hole is used in contexts where it
+can't know all possible things that can go in it.  In many cases we do know
+what all possible things are that can go in a typed hole, but many years ago
+didn't, say, or anyways, had a reason to use a typed hole.
+
+These are used not only in protocols that use ASN.1, but in many protocols that
+use alternative syntaxes and encodings.
+
+In ASN.1 these generally look like:
+
+```
+    TypedHole ::= SEQUENCE { typeId INTEGER, hole OCTET STRING }
+```
+
+or
+
+```
+    TypedHole ::= SEQUENCE {
+        typeId OBJECT IDENTIFIER,
+        opaque ANY DEFINED BY typeID
+    }
+```
+
+or
+
+```
+    TypedHole ::= SEQUENCE {
+        typeId OBJECT IDENTIFIER,
+        opaque ANY -- DEFINED BY typeID
+    }
+```
+
+or any number of variations.  (Note: the `ANY` variations are no longer
+conformant to X.680 (the base ASN.1 specification).)
+
+The pattern is `{ id, hole }` where the `hole` is ultimately an opaque sequence
+of bytes whose content's schema is identified by the `id` in the same data
+structure.
+
+Sometimes the "hole" is an `OCTET STRING`, sometimes it's a `BIT STRING`,
+sometimes it's an `ANY` or `ANY DEFINED BY`.
+
+An example from PKIX:
+
+```
+Extension ::= SEQUENCE {
+  extnID          OBJECT IDENTIFIER, -- <- type ID
+  critical        BOOLEAN OPTIONAL,
+  extnValue       OCTET STRING,      -- <- hole
+}
+```
+
+which shows that typed holes don't always have just three fields, and the type
+identifier isn't always an integer.
+
+Now, Heimdal's ASN.1 compiler generates the obvious C data structure for PKIX's
+`Extension` type:
+
+```
+    typedef struct Extension {
+      heim_oid extnID;
+      int *critical;
+      heim_octet_string extnValue;
+    } Extension;
+```
+
+and applications using this compiler have to inspect the `extnID` field,
+comparing it to any number of OIDs, to determine the type of `extnValue`, then
+must call `decode_ThatType()` to decode whatever that octet string has.
+
+This is very inconvenient.
+
+Compare this to the handling of discriminated unions (what ASN.1 calls a
+`CHOICE`):
+
+```
+    /*
+     * ASN.1 definition:
+     *
+     *  DistributionPointName ::= CHOICE {
+     *    fullName                  [0] IMPLICIT SEQUENCE OF GeneralName,
+     *    nameRelativeToCRLIssuer   [1] RelativeDistinguishedName,
+     *  }
+    */
+
+    /* C equivalent */
+    typedef struct DistributionPointName {
+      enum DistributionPointName_enum {
+        choice_DistributionPointName_fullName = 1,
+        choice_DistributionPointName_nameRelativeToCRLIssuer
+      } element;
+      union {
+        struct DistributionPointName_fullName {
+          unsigned int len;
+          GeneralName *val;
+        } fullName;
+        RelativeDistinguishedName nameRelativeToCRLIssuer;
+      } u;
+    } DistributionPointName;
+```
+
+The ASN.1 encoding on the wire of a `CHOICE` value, almost no matter the
+encoding rules, looks... remarkably like the encoding of a typed hole.  Though
+generally the alternatives of a discriminated union have to all be encoded with
+the same encoding rules, whereas with typed holes the encoded data could
+conceivably be encoded in radically different encoding rules than the structure
+containing it in a typed hole.
+
+In fact, extensible `CHOICE`s are handled by our compiler as a discriminated
+union one of whose alternatives is a typed hole when the `CHOICE` is
+extensible:
+
+```
+    typedef struct DigestRepInner {
+      enum DigestRepInner_enum {
+        choice_DigestRepInner_asn1_ellipsis = 0, /* <--- unknown CHOICE arm */
+        choice_DigestRepInner_error,
+        choice_DigestRepInner_initReply,
+        choice_DigestRepInner_response,
+        choice_DigestRepInner_ntlmInitReply,
+        choice_DigestRepInner_ntlmResponse,
+        choice_DigestRepInner_supportedMechs
+        /* ... */
+      } element;
+      union {
+        DigestError error;
+        DigestInitReply initReply;
+        DigestResponse response;
+        NTLMInitReply ntlmInitReply;
+        NTLMResponse ntlmResponse;
+        DigestTypes supportedMechs;
+        heim_octet_string asn1_ellipsis; /* <--- unknown CHOICE arm */
+      } u;
+    } DigestRepInner;
+```
+
+The critical thing to understand is that our compiler automatically decodes
+(and encodes) `CHOICE`s' alternatives, but it does NOT do that for typed holes
+because it knows nothing about them.
+
+It would be nice if we could treat *all* typed holes like `CHOICE`s whenever
+the compiler knows the alternatives!
+
+And that's exactly what the ASN.1 IOS system makes possible.  With ASN.1 IOS
+support, our compiler could automatically decode all the `Certificate`
+extensions, and all the distinguished name extensions it knows about.
+
+There is a fair bit of code in `lib/hx509/` that deals with encoding and
+decoding things in typed holes where the compiler could just handle that
+automatically for us, allowing us to delete a lot of code.
+
+Even more importantly, if we ever add support for visual encoding rules of
+ASN.1, such as JSON Encoding Rules (JER) [X.697] or Generic String Encoding
+Rules (GSER) [RFC2641], we could have a utility program to automatically
+display or compile DER (and other encodings) of certifcates and many other
+interesting data structures.
+
+
+ASN.1 IOS
+=========
+
+The ASN.1 IOS is additional syntax that allows ASN.1 module authors to express
+all the details about typed holes that ASN.1 compilers need to make developers'
+lives much easier.
+
+RFC5912 has lots of examples, such as this `CLASS` corresponding to the
+`Extension` type from PKIX:
+
+```
+  EXTENSION ::= CLASS {
+      &id  OBJECT IDENTIFIER UNIQUE,
+      &ExtnType,
+      &Critical    BOOLEAN DEFAULT {TRUE | FALSE }
+  } WITH SYNTAX {
+      SYNTAX &ExtnType IDENTIFIED BY &id
+      [CRITICALITY &Critical]
+  }
+
+  Extensions{EXTENSION:ExtensionSet} ::=
+      SEQUENCE SIZE (1..MAX) OF Extension{{ExtensionSet}}
+
+  Extension{EXTENSION:ExtensionSet} ::= SEQUENCE {
+      extnID      EXTENSION.&id({ExtensionSet}),
+      critical    BOOLEAN
+  --                     (EXTENSION.&Critical({ExtensionSet}{@extnID}))
+                       DEFAULT FALSE,
+      extnValue   OCTET STRING (CONTAINING
+                  EXTENSION.&ExtnType({ExtensionSet}{@extnID}))
+                  --  contains the DER encoding of the ASN.1 value
+                  --  corresponding to the extension type identified
+                  --  by extnID
+  }
+```
+
+and these uses of it in RFC5280 (PKIX base):
+
+```
+   ext-AuthorityKeyIdentifier EXTENSION ::= { SYNTAX
+       AuthorityKeyIdentifier IDENTIFIED BY
+       id-ce-authorityKeyIdentifier }
+   id-ce-authorityKeyIdentifier OBJECT IDENTIFIER ::=  { id-ce 35 }
+   ...
+
+   CertExtensions EXTENSION ::= {
+           ext-AuthorityKeyIdentifier | ext-SubjectKeyIdentifier |
+           ext-KeyUsage | ext-PrivateKeyUsagePeriod |
+           ext-CertificatePolicies | ext-PolicyMappings |
+           ext-SubjectAltName | ext-IssuerAltName |
+           ext-SubjectDirectoryAttributes |
+           ext-BasicConstraints | ext-NameConstraints |
+           ext-PolicyConstraints | ext-ExtKeyUsage |
+           ext-CRLDistributionPoints | ext-InhibitAnyPolicy |
+           ext-FreshestCRL | ext-AuthorityInfoAccess |
+           ext-SubjectInfoAccessSyntax, ... }
+   ...
+
+   Certificate  ::=  SIGNED{TBSCertificate}
+
+   TBSCertificate  ::=  SEQUENCE  {
+       version         [0]  Version DEFAULT v1,
+       serialNumber         CertificateSerialNumber,
+       signature            AlgorithmIdentifier{SIGNATURE-ALGORITHM,
+                                 {SignatureAlgorithms}},
+       issuer               Name,
+       validity             Validity,
+       subject              Name,
+       subjectPublicKeyInfo SubjectPublicKeyInfo,
+       ... ,
+       [[2:               -- If present, version MUST be v2
+       issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
+       subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL
+       ]],
+       [[3:               -- If present, version MUST be v3 --
+       extensions      [3]  Extensions{{CertExtensions}} OPTIONAL
+       ]], ... }
+```
+
+Notice that the `extensions` field of `TBSCertificate` is of type `Extensions`
+parametrized by the `CertExtensions` IOS object set.
+
+This allows the compiler to know that if any of the OIDs listed in the
+`CertExtensions` object set appear as the actual value of the `extnID` member
+of an `Extension` value, then the `extnValue` member of the same `Extension`
+value must be an instance of the type associated with that OID.  For example,
+an `Extension` with `extnID == id-ce-authorityKeyIdentifier` must have an
+`extnValue` of type `AuthorityKeyIdentifier`.
+
+
+Implementation Thoughts
+=======================
+
+ - The ASN.1 IOS is fairly large and non-trivial.  Perhaps we can just bake in
+   a few useful IOS classes without adding support for defining arbitrary
+   classes.
+
+   For dealing with PKIX, the bare minimum of IOS classes we should want are:
+
+    - ATTRIBUTE (used for DN attributes in PKIX base)
+    - EXTENSION (used for certificate attributes in PKIX base)
+
+   Then we can implement support for just declarations of information objects
+   and information object sets in `lib/asn1parse.y`, which is probably not a
+   very big deal.
+
+   Internally we can have a function for creating a class.
+
+ - We'll really want to do this mainly for the template compiler and begin
+   abandoning the original compiler -- hacking on two compilers is difficult,
+   and the template compiler is superior just on account of emitted code size
+   scaling as `O(N)` instead of `O(M * N)` where `M` is the number of encoding
+   rules supported and `N` is the number of types in an ASN.1 module (or all
+   modules).
+
+ - Also, to make the transition to using IOS in-tree, we'll want to add fields
+   to the C structures generated by the compiler today, that way code that
+   hasn't been updated to use the automatic encoding/decoding can still work.
+
+   Thus `Extension` should compile to:
+
+```
+    typedef struct Extension {
+      heim_oid extnID;
+      int *critical;
+      heim_octet_string extnValue;
+      enum Extension_iosnum {
+        Extension_iosnumunknown = 0, /* when the extnID is unrecognized */
+        Extension_iosnum_ext_AuthorityKeyIdentifier = 1,
+        Extension_iosnum_ext_ext-SubjectKeyIdentifier = 2,
+        ...
+      } _ios_element;
+      union {
+        heim_octet_string *_value;
+        authorityKeyIdentifier AuthorityKeyIdentifier;
+        subjectKeyIdentifier SubjectKeyIdentifier;
+        ...
+      } _ios_u;
+    } Extension;
+```
+
+   If a caller to `encode_Certificate()` passes a certificate object with
+   extensions with `_ioselement == Extension_iosnumunknown`, then the encoder
+   should use the `extnID` and `extnValue` fields, otherwise it should use the
+   `_ioselement` and `_iosu` fields.  (In both cases, the `critical` field
+   should get used.)
+
+ - We'll need to reduce the number of bits used to encode tag values in the
+   templates.  Currently we use 20 bits, but that's far too many.  We can
+   almost certainly get away with using only 10 bits for tags.  This will allow
+   us to have more opcodes, which we'll need more of in order to handle typed
+   holes described by IOS classes and information object sets.