asn1: Add a GitHub Markdown manual (moar)

This commit is contained in:
Nicolas Williams
2022-02-14 00:05:07 -06:00
parent 0929561de3
commit dda9aa2535

View File

@@ -73,7 +73,7 @@ As well, ASN.1 has [high-quality, freely-available specifications](https://www.i
## ASN.1 Example
For example, this is a `Certificate` as used in TLS and other protocols, taken
from [RFC5280]:
from [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280):
```ASN.1
Certificate ::= SEQUENCE {
@@ -96,7 +96,9 @@ from [RFC5280]:
}
```
and a more modern version from {RFC5912], using newer features of ASN.1:
and the same `Certificate` taken from a more modern version -from
[RFC5912](https://datatracker.ietf.org/doc/html/rfc5912)- using newer features
of ASN.1:
```ASN.1
Certificate ::= SIGNED{TBSCertificate}
@@ -130,16 +132,19 @@ identifiers" for the issuer and subject entities, and "extensions".
To understand more we'd have to look at the types of those fields of
`TBSCertificate`, but for now we won't do that. The point here is to show that
ASN.1 allows us to describe "types" of data in a way that resembles
"structures", "records", and "classes" in various programming languages.
"structures", "records", or "classes" in various programming languages.
To be sure, there are some "noisy" artifacts in the definition of
`TBSCertificate` which mostly have to do with the original encoding rules for
ASN.1. The original encoding rules for ASN.1 were tag-length-value (TLV)
binary encodings, meaning that for every type, the encoding of a value of that
type consisted of a tag, a length of the value's encoding, and the value's
encoding. Over time other encoding rules were added that do not require tags,
such as the octet encoding rules (OER), but also JSON encoding rules (JER), XML
encoding rules (XER), and others.
type consisted of a _tag_, a _length_ of the value's encoding, and the _actual
value's encoding_. Over time other encoding rules were added that do not
require tags, such as the octet encoding rules (OER), but also JSON encoding
rules (JER), XML encoding rules (XER), and others. There is almost no need for
tagging directives like `[1] IMPLICIT` when using OER. But in existing
protocols like PKIX and Kerberos that date back to the days when DER was king,
tagging directives are unfortunately commonplace.
## ASN.1 Crash Course
@@ -246,17 +251,20 @@ In modern ASN.1 it is possible to specify that a module uses `AUTOMATIC`
tagging so that one need never specify tags explicitly in order to fix
ambiguities.
Also, tehre are two types of tags: `IMPLICIT` and `EXPLICIT`. Implicit tags
replace the tags that the taggged type would have otherwise. Explicit tags
Also, there are two types of tags: `IMPLICIT` and `EXPLICIT`. Implicit tags
replace the tags that the tagged type would have otherwise. Explicit tags
treat the encoding of a type's value (including its tag and length) as the
value of the tagged type, thus yielding a tag-length-tag-length-value encoding.
Thus explicit tagging is unnecessarily redundant and wasteful. But implicit
tagging loses metadata that is useful for tools that can decode TLV encodings
without reference to the schema (module) corresponding to the types of values
encoded.
value of the tagged type, thus yielding a tag-length-tag-length-value encoding
-- a TLTLV encoding!
TLV encodings were probably never needed, but they exist, and becuase they are
widely used, cannot be removed.
Thus explicit tagging is more redundant and wasteful than implicit tagging.
But implicit tagging loses metadata that is useful for tools that can decode
TLV encodings without reference to the schema (module) corresponding to the
types of values encoded.
TLV encodings were probably never justified except by lack of tooling and
belief that codecs for TLV ERs can be hand-coded. But TLV RTs exist, and
because they are widely used, cannot be removed.
## Other Encoding Rules
@@ -320,7 +328,7 @@ alternative tags for disambiguation.
- The Generic String Encoding Rules are specified by IETF RFCs
[RFC3641](https://datatracker.ietf.org/doc/html/rfc3641),
[RFC3641](https://datatracker.ietf.org/doc/html/rfc3642),
[RFC3642](https://datatracker.ietf.org/doc/html/rfc3642),
[RFC4792](https://datatracker.ietf.org/doc/html/rfc4792).
Additional ERs can be added.
@@ -335,21 +343,65 @@ varying arrays), though with some extensions it could.
## Commentary
The text in this section is opinion.
The text in this section is the personal opinion of the author(s).
- ASN.1 gets a bad rap because BER/DER/CER are terrible encoding rules, as are
all TLV encoding rules.
- ASN.1 also gets a bad rap because its full syntax is not context-free, and
so parsing it requires context.
The BER family of encoding rules is a disaster, yes, but ASN.1 itself is
not. On the contrary, ASN.1 is quite rich in features and semantics -as
rich as any competitor- while also being very easy to write and understand
_as a syntax_.
The Heimdal ASN.1 compiler uses LALR(1) `yacc`/`bison`/`byacc`
- ASN.1 also gets a bad rap because its full syntax is not context-free, and
so parsing it can be tricky.
And yet the Heimdal ASN.1 compiler manages, using LALR(1) `yacc`/`bison`/`byacc`
parser-generators. For the subset of ASN.1 that this compiler handles,
there are no ambiguities. However, we understand that eventually we will
need to resort to C type punning and run-time typing to disambiguate parses.
need run into ambiguities.
For example, `ValueSet` and `ObjectSet` are ambiguous. X.680 says:
```
ValueSet ::= "{" ElementSetSpecs "}"
```
while X.681 says:
```
ObjectSet ::= "{" ObjectSetSpec "}"
```
and the set members can be just the symbolic names of members, in which case
there's no grammatical difference between those two productions. These then
cause a conflict in the `FieldSetting` production, which is used in the
`ObjectDefn` production, which is used in defining an object (which is to be
referenced from some `ObjectSet` or `FieldSetting`).
This particular conflict can be resolved by one of:
- limiting the power of object sets by disallowing recursion (object sets
containing objects that have field settings that are object sets ...),
- or by introducing additional required and disambiguating syntactic
elements that preclude full compliance with ASN.1,
- or by simply using the same production and type internally to handle
both, the `ValueSet` and `ObjectSet` productions and then internally
resolving the actual type as late as possible by either inspecting the
types of the set members or by inspecting the expected kind of field that
the `ValueSet`-or-`ObjectSet` is setting.
Clearly, only the last of these is satisfying, but it is more work for the
compiler developer.
- TLV encodings are bad because they yield unnecessary redundance in
encodings.
encodings. This is space-inefficient, but also a source of bugs in
hand-coded codecs for TLV encodings.
EXPLICIT tagging makes this worse by making the encoding a TLTLV encoding
(tag length tag length value). (The inner TLV is the V for the outer TL.)
- TLV encodings are often described as "self-describing" because one can
usually write a `dumpasn1` style of tool that attempts to decode a TLV
@@ -376,9 +428,16 @@ The text in this section is opinion.
Flat Buffers, or most encodings, though it can be done with some encodings,
such as BER and NDR (NDR has "pipes" for this).
Some clues are needed in order to produce an codec that can handle such
on-line behavior. In IDL/NDR that clue comes from the "pipe" type. In
ASN.1 there is no such clue and it would have to be provided separately to
the ASN.1 compiler (e.g., as a command-line option).
- Protocol Buffers is a TLV encoding. There was no need to make it a TLV
encoding. Public opinion tends to prefer Flat Buffers now, which is not a
TLV encoding and which is comparable to XDR/NDR/PER/OER.
encoding.
Public opinion seems to prefer Flat Buffers now, which is not a TLV encoding
and which is more comparable to XDR/NDR/PER/OER.
# Heimdal ASN.1 Compiler
@@ -391,6 +450,13 @@ The compiler currently emits:
- C types corresponding to ASN.1 modules' types
- C functions for DER (and some BER) codecs for ASN.1 modules' types
We vaguely hope to eventually move to using the JSON representation of ASN.1
modules to do code generation in a programming language like `jq` rather than
in C. The idea there is to make it much easier to target other programming
languages than C, especially Rust, so that we can start moving Heimdal to Rust
(first after this would be `lib/hx509`, then `lib/krb5`, then `lib/hdb`, then
`lib/gssapi`, then `kdc/`).
The compiler has two "backends":
- C code generation
@@ -403,6 +469,11 @@ Supported encoding rules:
- DER
- BER decoding (but not encoding)
As well, the Heimdal ASN.1 compiler can render values as JSON using an ad-hoc
metaschema that is not quite JER-compliant. A sample rendering of a complex
PKIX `Certificate` with all typed holes automatically decoded is shown in
[README.md#features](README.md#features).
The Heimdal ASN.1 compiler supports open types via X.681/X.682/X.683 syntax.
Specifically: (when using the template backend) the generated codecs can
automatically and recursively decode and encode through "typed holes".
@@ -419,13 +490,13 @@ language (e.g., English) meant only for humans to understand. Documenting open
types with formal syntax allows compilers to support them specially.
See the the [`asn1_compile(1)` manual page](#Manual-Page-for-asn1_compile)
below, and the `README` files, for more details on limitations. Excerpt from
the manual page:
below and [README.md#features](README.md#features), for more details on
limitations. Excerpt from the manual page:
```
The Information Object System support includes automatic codec support
for encoding and decoding through “open types” which are also known as
“typed holes”. See RFC 5912 for examples of how to use the ASN.1 Infor-
“typed holes”. See RFC5912 for examples of how to use the ASN.1 Infor-
mation Object System via X.681/X.682/X.683 annotations. See the com-
piler's README files for more information on ASN.1 Information Object
System support.
@@ -476,7 +547,7 @@ Caveats and ASN.1 x.680 features not supported:
also not supported.
```
### Easy-to-Use C Types
## Easy-to-Use C Types
The Heimdal ASN.1 compiler generates easy-to-use C types for ASN.1 types.
@@ -573,7 +644,7 @@ code-generators do, of course, so it's not surprising. But you can see that
- in C we use `typedef`s to make the type names usable without having to add
`struct`
### Generated APIs For Any Given Type T
## Generated APIs For Any Given Type T
The C functions generated for ASN.1 types are all of the same form, for any
type `T`:
@@ -611,7 +682,8 @@ memory resources will be released. Note that the C object _itself_ is not
freed, only its _content_.
The `print_T()` functions encode the value of a C object of type `T` in JSON
(though not in JER-compliant JSON).
(though not in JER-compliant JSON). A sample printing of a complex PKIX
`Certificate` can be seen in [README.md#features](README.md#features).
These functions are all recursive.
@@ -620,7 +692,7 @@ These functions are all recursive.
> `LIBASN1.LIB` to avoid possibly freeing memory allocated by a different
> allocator.
### Error Handling
## Error Handling
All codec functions that return errors return them as `int`.
@@ -671,7 +743,7 @@ You can use the `com_err` library to display these errors as strings:
}
```
### Using the Generated APIs
## Using the Generated APIs
Value construction is as usual in C. Use the standard C allocator for
allocating values of `OPTIONAL` fields.
@@ -770,6 +842,11 @@ or, the same code w/o the `ASN1_MALLOC_ENCODE()` macro:
free(bytes);
```
## Open Types
The handling of X.681/X.682/X.683 syntax for open types is described at length
in [README-X681.md](README-X681.md).
## Command-line Usage
The compiler takes an ASN.1 module file name and outputs a C header and C
@@ -882,7 +959,7 @@ ASN.1 usage conventions:
See the [manual page for `asn1_compile(1)`](#Manual-Page-for-asn1_compile) for
a full listing of command-line options.
## Manual Page for `asn1_compile(1)`
### Manual Page for `asn1_compile(1)`
```
ASN1_COMPILE(1) BSD General Commands Manual ASN1_COMPILE(1)
@@ -1097,3 +1174,30 @@ NOTES
HEIMDAL February 22, 2021 HEIMDAL
```
# Future Directions
The Heimdal ASN.1 compiler is focused on PKIX and Kerberos, and is almost
feature-complete for dealing with those. It could use additional support for
X.681/X.682/X.683 elements that would allow the compiler to implement
`Certificate ::= SIGNED{TBSCertificate}`, particularly the ability to
automatically validate cryptographic algorithm parameters. However, this is
not that important.
Another feature that might be nice is the ability of callers to specify smaller
information object sets when decoding values of types like `Certificate`,
mainly to avoid decoding types in typed holes that are not of interest to the
application.
For testing, a JSON reader to go with the JSON printer might be nice, and
anyways, would make for a generally useful tool.
Another feature that would be nice would to automatically generate SQL and LDAP
code for HDB based on `lib/hdb/hdb.asn1` (with certain usage conventions and/or
compiler command-line options to make it possible to map schemas usefully).
For the `hxtool` command, it would be nice if the user could input arbitrary
certificate extensions and `subjectAlternativeName` (SAN) values in JSON + an
ASN.1 module and type reference that `hxtool` could then parse and encode using
the ASN.1 compiler and library. Currently the `hx509` library and its `hxtool`
command must be taught about every SAN type.