asn1: Add a GitHub Markdown manual (moar)
This commit is contained in:
@@ -73,7 +73,7 @@ As well, ASN.1 has [high-quality, freely-available specifications](https://www.i
|
|||||||
## ASN.1 Example
|
## ASN.1 Example
|
||||||
|
|
||||||
For example, this is a `Certificate` as used in TLS and other protocols, taken
|
For example, this is a `Certificate` as used in TLS and other protocols, taken
|
||||||
from [RFC5280]:
|
from [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280):
|
||||||
|
|
||||||
```ASN.1
|
```ASN.1
|
||||||
Certificate ::= SEQUENCE {
|
Certificate ::= SEQUENCE {
|
||||||
@@ -96,7 +96,9 @@ from [RFC5280]:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
and a more modern version from {RFC5912], using newer features of ASN.1:
|
and the same `Certificate` taken from a more modern version -from
|
||||||
|
[RFC5912](https://datatracker.ietf.org/doc/html/rfc5912)- using newer features
|
||||||
|
of ASN.1:
|
||||||
|
|
||||||
```ASN.1
|
```ASN.1
|
||||||
Certificate ::= SIGNED{TBSCertificate}
|
Certificate ::= SIGNED{TBSCertificate}
|
||||||
@@ -130,16 +132,19 @@ identifiers" for the issuer and subject entities, and "extensions".
|
|||||||
To understand more we'd have to look at the types of those fields of
|
To understand more we'd have to look at the types of those fields of
|
||||||
`TBSCertificate`, but for now we won't do that. The point here is to show that
|
`TBSCertificate`, but for now we won't do that. The point here is to show that
|
||||||
ASN.1 allows us to describe "types" of data in a way that resembles
|
ASN.1 allows us to describe "types" of data in a way that resembles
|
||||||
"structures", "records", and "classes" in various programming languages.
|
"structures", "records", or "classes" in various programming languages.
|
||||||
|
|
||||||
To be sure, there are some "noisy" artifacts in the definition of
|
To be sure, there are some "noisy" artifacts in the definition of
|
||||||
`TBSCertificate` which mostly have to do with the original encoding rules for
|
`TBSCertificate` which mostly have to do with the original encoding rules for
|
||||||
ASN.1. The original encoding rules for ASN.1 were tag-length-value (TLV)
|
ASN.1. The original encoding rules for ASN.1 were tag-length-value (TLV)
|
||||||
binary encodings, meaning that for every type, the encoding of a value of that
|
binary encodings, meaning that for every type, the encoding of a value of that
|
||||||
type consisted of a tag, a length of the value's encoding, and the value's
|
type consisted of a _tag_, a _length_ of the value's encoding, and the _actual
|
||||||
encoding. Over time other encoding rules were added that do not require tags,
|
value's encoding_. Over time other encoding rules were added that do not
|
||||||
such as the octet encoding rules (OER), but also JSON encoding rules (JER), XML
|
require tags, such as the octet encoding rules (OER), but also JSON encoding
|
||||||
encoding rules (XER), and others.
|
rules (JER), XML encoding rules (XER), and others. There is almost no need for
|
||||||
|
tagging directives like `[1] IMPLICIT` when using OER. But in existing
|
||||||
|
protocols like PKIX and Kerberos that date back to the days when DER was king,
|
||||||
|
tagging directives are unfortunately commonplace.
|
||||||
|
|
||||||
## ASN.1 Crash Course
|
## ASN.1 Crash Course
|
||||||
|
|
||||||
@@ -246,17 +251,20 @@ In modern ASN.1 it is possible to specify that a module uses `AUTOMATIC`
|
|||||||
tagging so that one need never specify tags explicitly in order to fix
|
tagging so that one need never specify tags explicitly in order to fix
|
||||||
ambiguities.
|
ambiguities.
|
||||||
|
|
||||||
Also, tehre are two types of tags: `IMPLICIT` and `EXPLICIT`. Implicit tags
|
Also, there are two types of tags: `IMPLICIT` and `EXPLICIT`. Implicit tags
|
||||||
replace the tags that the taggged type would have otherwise. Explicit tags
|
replace the tags that the tagged type would have otherwise. Explicit tags
|
||||||
treat the encoding of a type's value (including its tag and length) as the
|
treat the encoding of a type's value (including its tag and length) as the
|
||||||
value of the tagged type, thus yielding a tag-length-tag-length-value encoding.
|
value of the tagged type, thus yielding a tag-length-tag-length-value encoding
|
||||||
Thus explicit tagging is unnecessarily redundant and wasteful. But implicit
|
-- a TLTLV encoding!
|
||||||
tagging loses metadata that is useful for tools that can decode TLV encodings
|
|
||||||
without reference to the schema (module) corresponding to the types of values
|
|
||||||
encoded.
|
|
||||||
|
|
||||||
TLV encodings were probably never needed, but they exist, and becuase they are
|
Thus explicit tagging is more redundant and wasteful than implicit tagging.
|
||||||
widely used, cannot be removed.
|
But implicit tagging loses metadata that is useful for tools that can decode
|
||||||
|
TLV encodings without reference to the schema (module) corresponding to the
|
||||||
|
types of values encoded.
|
||||||
|
|
||||||
|
TLV encodings were probably never justified except by lack of tooling and
|
||||||
|
belief that codecs for TLV ERs can be hand-coded. But TLV RTs exist, and
|
||||||
|
because they are widely used, cannot be removed.
|
||||||
|
|
||||||
## Other Encoding Rules
|
## Other Encoding Rules
|
||||||
|
|
||||||
@@ -320,7 +328,7 @@ alternative tags for disambiguation.
|
|||||||
|
|
||||||
- The Generic String Encoding Rules are specified by IETF RFCs
|
- The Generic String Encoding Rules are specified by IETF RFCs
|
||||||
[RFC3641](https://datatracker.ietf.org/doc/html/rfc3641),
|
[RFC3641](https://datatracker.ietf.org/doc/html/rfc3641),
|
||||||
[RFC3641](https://datatracker.ietf.org/doc/html/rfc3642),
|
[RFC3642](https://datatracker.ietf.org/doc/html/rfc3642),
|
||||||
[RFC4792](https://datatracker.ietf.org/doc/html/rfc4792).
|
[RFC4792](https://datatracker.ietf.org/doc/html/rfc4792).
|
||||||
|
|
||||||
Additional ERs can be added.
|
Additional ERs can be added.
|
||||||
@@ -335,21 +343,65 @@ varying arrays), though with some extensions it could.
|
|||||||
|
|
||||||
## Commentary
|
## Commentary
|
||||||
|
|
||||||
The text in this section is opinion.
|
The text in this section is the personal opinion of the author(s).
|
||||||
|
|
||||||
- ASN.1 gets a bad rap because BER/DER/CER are terrible encoding rules, as are
|
- ASN.1 gets a bad rap because BER/DER/CER are terrible encoding rules, as are
|
||||||
all TLV encoding rules.
|
all TLV encoding rules.
|
||||||
|
|
||||||
- ASN.1 also gets a bad rap because its full syntax is not context-free, and
|
The BER family of encoding rules is a disaster, yes, but ASN.1 itself is
|
||||||
so parsing it requires context.
|
not. On the contrary, ASN.1 is quite rich in features and semantics -as
|
||||||
|
rich as any competitor- while also being very easy to write and understand
|
||||||
|
_as a syntax_.
|
||||||
|
|
||||||
The Heimdal ASN.1 compiler uses LALR(1) `yacc`/`bison`/`byacc`
|
- ASN.1 also gets a bad rap because its full syntax is not context-free, and
|
||||||
|
so parsing it can be tricky.
|
||||||
|
|
||||||
|
And yet the Heimdal ASN.1 compiler manages, using LALR(1) `yacc`/`bison`/`byacc`
|
||||||
parser-generators. For the subset of ASN.1 that this compiler handles,
|
parser-generators. For the subset of ASN.1 that this compiler handles,
|
||||||
there are no ambiguities. However, we understand that eventually we will
|
there are no ambiguities. However, we understand that eventually we will
|
||||||
need to resort to C type punning and run-time typing to disambiguate parses.
|
need run into ambiguities.
|
||||||
|
|
||||||
|
For example, `ValueSet` and `ObjectSet` are ambiguous. X.680 says:
|
||||||
|
|
||||||
|
```
|
||||||
|
ValueSet ::= "{" ElementSetSpecs "}"
|
||||||
|
```
|
||||||
|
|
||||||
|
while X.681 says:
|
||||||
|
|
||||||
|
```
|
||||||
|
ObjectSet ::= "{" ObjectSetSpec "}"
|
||||||
|
```
|
||||||
|
|
||||||
|
and the set members can be just the symbolic names of members, in which case
|
||||||
|
there's no grammatical difference between those two productions. These then
|
||||||
|
cause a conflict in the `FieldSetting` production, which is used in the
|
||||||
|
`ObjectDefn` production, which is used in defining an object (which is to be
|
||||||
|
referenced from some `ObjectSet` or `FieldSetting`).
|
||||||
|
|
||||||
|
This particular conflict can be resolved by one of:
|
||||||
|
|
||||||
|
- limiting the power of object sets by disallowing recursion (object sets
|
||||||
|
containing objects that have field settings that are object sets ...),
|
||||||
|
|
||||||
|
- or by introducing additional required and disambiguating syntactic
|
||||||
|
elements that preclude full compliance with ASN.1,
|
||||||
|
|
||||||
|
- or by simply using the same production and type internally to handle
|
||||||
|
both, the `ValueSet` and `ObjectSet` productions and then internally
|
||||||
|
resolving the actual type as late as possible by either inspecting the
|
||||||
|
types of the set members or by inspecting the expected kind of field that
|
||||||
|
the `ValueSet`-or-`ObjectSet` is setting.
|
||||||
|
|
||||||
|
Clearly, only the last of these is satisfying, but it is more work for the
|
||||||
|
compiler developer.
|
||||||
|
|
||||||
- TLV encodings are bad because they yield unnecessary redundance in
|
- TLV encodings are bad because they yield unnecessary redundance in
|
||||||
encodings.
|
encodings. This is space-inefficient, but also a source of bugs in
|
||||||
|
hand-coded codecs for TLV encodings.
|
||||||
|
|
||||||
|
EXPLICIT tagging makes this worse by making the encoding a TLTLV encoding
|
||||||
|
(tag length tag length value). (The inner TLV is the V for the outer TL.)
|
||||||
|
|
||||||
- TLV encodings are often described as "self-describing" because one can
|
- TLV encodings are often described as "self-describing" because one can
|
||||||
usually write a `dumpasn1` style of tool that attempts to decode a TLV
|
usually write a `dumpasn1` style of tool that attempts to decode a TLV
|
||||||
@@ -376,9 +428,16 @@ The text in this section is opinion.
|
|||||||
Flat Buffers, or most encodings, though it can be done with some encodings,
|
Flat Buffers, or most encodings, though it can be done with some encodings,
|
||||||
such as BER and NDR (NDR has "pipes" for this).
|
such as BER and NDR (NDR has "pipes" for this).
|
||||||
|
|
||||||
|
Some clues are needed in order to produce an codec that can handle such
|
||||||
|
on-line behavior. In IDL/NDR that clue comes from the "pipe" type. In
|
||||||
|
ASN.1 there is no such clue and it would have to be provided separately to
|
||||||
|
the ASN.1 compiler (e.g., as a command-line option).
|
||||||
|
|
||||||
- Protocol Buffers is a TLV encoding. There was no need to make it a TLV
|
- Protocol Buffers is a TLV encoding. There was no need to make it a TLV
|
||||||
encoding. Public opinion tends to prefer Flat Buffers now, which is not a
|
encoding.
|
||||||
TLV encoding and which is comparable to XDR/NDR/PER/OER.
|
|
||||||
|
Public opinion seems to prefer Flat Buffers now, which is not a TLV encoding
|
||||||
|
and which is more comparable to XDR/NDR/PER/OER.
|
||||||
|
|
||||||
# Heimdal ASN.1 Compiler
|
# Heimdal ASN.1 Compiler
|
||||||
|
|
||||||
@@ -391,6 +450,13 @@ The compiler currently emits:
|
|||||||
- C types corresponding to ASN.1 modules' types
|
- C types corresponding to ASN.1 modules' types
|
||||||
- C functions for DER (and some BER) codecs for ASN.1 modules' types
|
- C functions for DER (and some BER) codecs for ASN.1 modules' types
|
||||||
|
|
||||||
|
We vaguely hope to eventually move to using the JSON representation of ASN.1
|
||||||
|
modules to do code generation in a programming language like `jq` rather than
|
||||||
|
in C. The idea there is to make it much easier to target other programming
|
||||||
|
languages than C, especially Rust, so that we can start moving Heimdal to Rust
|
||||||
|
(first after this would be `lib/hx509`, then `lib/krb5`, then `lib/hdb`, then
|
||||||
|
`lib/gssapi`, then `kdc/`).
|
||||||
|
|
||||||
The compiler has two "backends":
|
The compiler has two "backends":
|
||||||
|
|
||||||
- C code generation
|
- C code generation
|
||||||
@@ -403,6 +469,11 @@ Supported encoding rules:
|
|||||||
- DER
|
- DER
|
||||||
- BER decoding (but not encoding)
|
- BER decoding (but not encoding)
|
||||||
|
|
||||||
|
As well, the Heimdal ASN.1 compiler can render values as JSON using an ad-hoc
|
||||||
|
metaschema that is not quite JER-compliant. A sample rendering of a complex
|
||||||
|
PKIX `Certificate` with all typed holes automatically decoded is shown in
|
||||||
|
[README.md#features](README.md#features).
|
||||||
|
|
||||||
The Heimdal ASN.1 compiler supports open types via X.681/X.682/X.683 syntax.
|
The Heimdal ASN.1 compiler supports open types via X.681/X.682/X.683 syntax.
|
||||||
Specifically: (when using the template backend) the generated codecs can
|
Specifically: (when using the template backend) the generated codecs can
|
||||||
automatically and recursively decode and encode through "typed holes".
|
automatically and recursively decode and encode through "typed holes".
|
||||||
@@ -419,13 +490,13 @@ language (e.g., English) meant only for humans to understand. Documenting open
|
|||||||
types with formal syntax allows compilers to support them specially.
|
types with formal syntax allows compilers to support them specially.
|
||||||
|
|
||||||
See the the [`asn1_compile(1)` manual page](#Manual-Page-for-asn1_compile)
|
See the the [`asn1_compile(1)` manual page](#Manual-Page-for-asn1_compile)
|
||||||
below, and the `README` files, for more details on limitations. Excerpt from
|
below and [README.md#features](README.md#features), for more details on
|
||||||
the manual page:
|
limitations. Excerpt from the manual page:
|
||||||
|
|
||||||
```
|
```
|
||||||
The Information Object System support includes automatic codec support
|
The Information Object System support includes automatic codec support
|
||||||
for encoding and decoding through “open types” which are also known as
|
for encoding and decoding through “open types” which are also known as
|
||||||
“typed holes”. See RFC 5912 for examples of how to use the ASN.1 Infor-
|
“typed holes”. See RFC5912 for examples of how to use the ASN.1 Infor-
|
||||||
mation Object System via X.681/X.682/X.683 annotations. See the com-
|
mation Object System via X.681/X.682/X.683 annotations. See the com-
|
||||||
piler's README files for more information on ASN.1 Information Object
|
piler's README files for more information on ASN.1 Information Object
|
||||||
System support.
|
System support.
|
||||||
@@ -476,7 +547,7 @@ Caveats and ASN.1 x.680 features not supported:
|
|||||||
also not supported.
|
also not supported.
|
||||||
```
|
```
|
||||||
|
|
||||||
### Easy-to-Use C Types
|
## Easy-to-Use C Types
|
||||||
|
|
||||||
The Heimdal ASN.1 compiler generates easy-to-use C types for ASN.1 types.
|
The Heimdal ASN.1 compiler generates easy-to-use C types for ASN.1 types.
|
||||||
|
|
||||||
@@ -573,7 +644,7 @@ code-generators do, of course, so it's not surprising. But you can see that
|
|||||||
- in C we use `typedef`s to make the type names usable without having to add
|
- in C we use `typedef`s to make the type names usable without having to add
|
||||||
`struct`
|
`struct`
|
||||||
|
|
||||||
### Generated APIs For Any Given Type T
|
## Generated APIs For Any Given Type T
|
||||||
|
|
||||||
The C functions generated for ASN.1 types are all of the same form, for any
|
The C functions generated for ASN.1 types are all of the same form, for any
|
||||||
type `T`:
|
type `T`:
|
||||||
@@ -611,7 +682,8 @@ memory resources will be released. Note that the C object _itself_ is not
|
|||||||
freed, only its _content_.
|
freed, only its _content_.
|
||||||
|
|
||||||
The `print_T()` functions encode the value of a C object of type `T` in JSON
|
The `print_T()` functions encode the value of a C object of type `T` in JSON
|
||||||
(though not in JER-compliant JSON).
|
(though not in JER-compliant JSON). A sample printing of a complex PKIX
|
||||||
|
`Certificate` can be seen in [README.md#features](README.md#features).
|
||||||
|
|
||||||
These functions are all recursive.
|
These functions are all recursive.
|
||||||
|
|
||||||
@@ -620,7 +692,7 @@ These functions are all recursive.
|
|||||||
> `LIBASN1.LIB` to avoid possibly freeing memory allocated by a different
|
> `LIBASN1.LIB` to avoid possibly freeing memory allocated by a different
|
||||||
> allocator.
|
> allocator.
|
||||||
|
|
||||||
### Error Handling
|
## Error Handling
|
||||||
|
|
||||||
All codec functions that return errors return them as `int`.
|
All codec functions that return errors return them as `int`.
|
||||||
|
|
||||||
@@ -671,7 +743,7 @@ You can use the `com_err` library to display these errors as strings:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Using the Generated APIs
|
## Using the Generated APIs
|
||||||
|
|
||||||
Value construction is as usual in C. Use the standard C allocator for
|
Value construction is as usual in C. Use the standard C allocator for
|
||||||
allocating values of `OPTIONAL` fields.
|
allocating values of `OPTIONAL` fields.
|
||||||
@@ -770,6 +842,11 @@ or, the same code w/o the `ASN1_MALLOC_ENCODE()` macro:
|
|||||||
free(bytes);
|
free(bytes);
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Open Types
|
||||||
|
|
||||||
|
The handling of X.681/X.682/X.683 syntax for open types is described at length
|
||||||
|
in [README-X681.md](README-X681.md).
|
||||||
|
|
||||||
## Command-line Usage
|
## Command-line Usage
|
||||||
|
|
||||||
The compiler takes an ASN.1 module file name and outputs a C header and C
|
The compiler takes an ASN.1 module file name and outputs a C header and C
|
||||||
@@ -882,7 +959,7 @@ ASN.1 usage conventions:
|
|||||||
See the [manual page for `asn1_compile(1)`](#Manual-Page-for-asn1_compile) for
|
See the [manual page for `asn1_compile(1)`](#Manual-Page-for-asn1_compile) for
|
||||||
a full listing of command-line options.
|
a full listing of command-line options.
|
||||||
|
|
||||||
## Manual Page for `asn1_compile(1)`
|
### Manual Page for `asn1_compile(1)`
|
||||||
|
|
||||||
```
|
```
|
||||||
ASN1_COMPILE(1) BSD General Commands Manual ASN1_COMPILE(1)
|
ASN1_COMPILE(1) BSD General Commands Manual ASN1_COMPILE(1)
|
||||||
@@ -1097,3 +1174,30 @@ NOTES
|
|||||||
|
|
||||||
HEIMDAL February 22, 2021 HEIMDAL
|
HEIMDAL February 22, 2021 HEIMDAL
|
||||||
```
|
```
|
||||||
|
|
||||||
|
# Future Directions
|
||||||
|
|
||||||
|
The Heimdal ASN.1 compiler is focused on PKIX and Kerberos, and is almost
|
||||||
|
feature-complete for dealing with those. It could use additional support for
|
||||||
|
X.681/X.682/X.683 elements that would allow the compiler to implement
|
||||||
|
`Certificate ::= SIGNED{TBSCertificate}`, particularly the ability to
|
||||||
|
automatically validate cryptographic algorithm parameters. However, this is
|
||||||
|
not that important.
|
||||||
|
|
||||||
|
Another feature that might be nice is the ability of callers to specify smaller
|
||||||
|
information object sets when decoding values of types like `Certificate`,
|
||||||
|
mainly to avoid decoding types in typed holes that are not of interest to the
|
||||||
|
application.
|
||||||
|
|
||||||
|
For testing, a JSON reader to go with the JSON printer might be nice, and
|
||||||
|
anyways, would make for a generally useful tool.
|
||||||
|
|
||||||
|
Another feature that would be nice would to automatically generate SQL and LDAP
|
||||||
|
code for HDB based on `lib/hdb/hdb.asn1` (with certain usage conventions and/or
|
||||||
|
compiler command-line options to make it possible to map schemas usefully).
|
||||||
|
|
||||||
|
For the `hxtool` command, it would be nice if the user could input arbitrary
|
||||||
|
certificate extensions and `subjectAlternativeName` (SAN) values in JSON + an
|
||||||
|
ASN.1 module and type reference that `hxtool` could then parse and encode using
|
||||||
|
the ASN.1 compiler and library. Currently the `hx509` library and its `hxtool`
|
||||||
|
command must be taught about every SAN type.
|
||||||
|
Reference in New Issue
Block a user