SerentityData: Variable-byte, Statistically-based Entity Serialization & Field Encoding

The SerentityData Entity Serialization and Field Encoding library fulfills Blockchain Requirement 1 for the Universal UBL (UUBL) extensions to the UBL 2.2 business document schema specification:

Requirement 1. Compact and Efficient Binary Serialization

The requirement for extremely compact and efficient binary serialization of each entity (and subentity) can be fulfilled by various software serialization solutions including SerentityData Variable-byte, Statistically-based Entity Serialization. The SerentityData project can be found on Github. SerentityData is the author’s preferred solution.

Design and Implementation Strategy

Design Principles

  1. Statistically-based Encodings (STE): For a particular datatype (e.g. Signed Int 16 or Unsigned Int 64), some data values representable by this datatype will be used more frequently than others (e.g. 0 (zero), 1, small integer values, etc.) and there should be a way to encode these statistically more frequent values using as few bytes as possible (compared to larger data values).
  2. Variable-byte Encodings (VBE): Over the lifespan of a persisted field (e.g. an Unsigned Int 64 blockchain block or transaction serial number), the initial values of the field will have small values (0, 1, 2, 3, …) and over the course of a long time (years or decades) grow to have very large values. As few bytes as possible should be used to represent small values and this should be less than the number of bytes required to represent very large values.
  3. Application-adaptive Encodings (AAE): Every application or application data domain will require:
    1. A different subset of the available datatypes, and
    2. Each data type will have a different statistical distribution of application-dependent data values.
  4. Single-byte Encodings (SBE): Very common (application-dependent) data values, on a datatype-by-datatype basis, should only require 1 (one) byte of storage. For example, for specific applications, the following values would be candidates for single-byte encodings:
    1. Data value TRUE of datatype Boolean and data value False of datatype Boolean
    2. Specific data values of small whole numbers (e.g. 0, 1, 2, 3, 4, 5, … n) where, depending on the application and datatype, n might be 10, 32 (days of the month), 60 (seconds in a minute), etc. The values do not have to be contiguous.
    3. Specific data values of negative numbers (e.g. -1, -2, -3, -4, -5, … m) where, depending on the application and datatype, m might be -10, -20, etc. The values do not have to be contiguous.
    4. Specific data values of Enum datatypes. The values are usually contiguous but there is no requirement for them to be contiguous.
  5. Support All Datatypes (SAD): It should be possible to encode all possible datatypes and all possible data values for the selected data types.  The current list of supported data types includes:
    1. Signed Integer 16
    2. Signed Integer 32
    3. Signed Integer 64
    4. Unsigned Integer 16
    5. Unsigned Integer 32
    6. Unsigned Integer 64
    7. Byte
    8. Signed Byte
    9. Enum
    10. Byte Array
    11. Boolean
    12. Boolean
    13. Char
    14. String
    15. ASCIIString
    16. Address
    17. Guid
  6. Standalone, Compact, and Efficient (SCE) Implementation
    1. Standalone: The entity serialization and field encoding libraries will not rely on any external source code or callable binary libraries.
    2. Compact: The entity serialization and field encoding libraries will be compact enough to be useful and desirable to execute:
      1. Off-chain in a traditional computing environment including server, PC, tablet, and mobile device, as well as
      2. On-chain in a smart contract (virtual machine) execution environment
    3. Efficient: Highly efficient code execution to be useful and desirable to use in a fee-based, smart contract (virtual machine) execution environment
  7. Storage Compatibility (SC): Compatible with any data storage technology capable of storing variable-length arrays of bytes representing the field encoded and entity serialized data.
  8. Runtime Compatibility (RC): The current design and implementation of the entity serialization and field encoding libraries is compatible with .NET Core 2.0 or later.
  9. Versioning Support (VS): Support for versioning at the entity serialization as well as entity encoding levels; including support for multiple applications, serializations and encodings in the same library.
  10. Entity Extensibility and Backward Compatibility (EEBC): Entity declarations can be extended through the additional fields without losing backward compatibility with previously serialized entities.
  11. Support for ByteArray and String Null-Valued References (NULLS): Support for null-valued references to ByteArrays and Strings – in addition to zero-length and non-zero length ByteArrays and Strings.

Assumptions

  1. There exists a code generation tool that takes as input a description of an entity, its fields and their datatypes that will create the sequence of calls into the entity serialization and field encoding that performs serialization/deserialization and field coding/decoding for a target entity declaration (e.g. a C# class declaration).

Implementation Notes

  1. The reference implementation of the entity serialization and field encoding libraries is called SerentityData and is implemented using .NET Core 2.0, C#, and Visual Studio 2007 Community Edition.

Roadmap

Features that are unsupported in the current release:

  1. Collections and Mappings: next on the queue
  2. Nested Entity Declarations: An entity declaration that has a field whose datatype is another entity declaration.

Field Encoding Strategy

TODO

Byte 0 Mapping

In the Variable Byte Encoding scheme used to represent the value of a datatype, the first byte of an encoded field value (Byte 0) is the most important. For Single-byte Encodings, Byte 0 needs to encode both the datatype of a particular value but also the value itself.

Given there are only 256 unique values that Byte 0 (or any single byte) can assume, certain trade-offs must be made to accommodate the range of application-driven datatypes required and the number of possible single-byte, double-byte, and triple-byte optimizations that are possible. This is where statistical knowledge of the range of values that a particular datatype required by an application is important.

Byte 0 Default Mapping

Figure 1 below is an example of the current default set of Byte 0 mappings.

SerentityData1.pngFigure 1. Byte 0 Default Mapping Table

Data value 0 (zero) is currently not assigned.

Single-Byte Encodings

TODO

Two-Byte Encodings

TODO

Three-Byte Encodings

TODO

Longer Byte Encodings

TODO

Performance

TODO

Sample Use Case

An example of a distributed business application designed to be used with SerentityData is the following SerentityDapp.Perfmon for on-chain [blockchain] performance monitoring and recording.

NCP-001 SerentityDapp.Perfmon v0.7Figure 2. SerentityDapp.Perfmon Data Model – Onchain [blockchain] Performance Monitoring and Recording Example

Appendix A – SerentityData Field Encoding Details

Supported Data Types

02-datatypes

Special Use Cases

21-specialcases

Field Buffer Configurations

A. Buffered Values

  • 3 Fields = FieldEncodingType + Buffer Length + Buffer Bytes
Use Case 0. 16-bit Buffered Value

 

33-usecase0

Use Case 1. 32-bit Buffered Value

 

48-usecase1

 

Use Case 2. 64-bit Buffered Value

 

53-usecase2

B. Headered Values

  • 2 Fields = FieldEncodingType + Value (stored in the 16-bit, 32-bit, or 64-bit Buffer Length field)
Use Case 3. 16-bit Headered Value

 

59-usecase3

Use Case 4. 32-bit Headered Value

 

73-usecase4

Use Case 5. 64-bit Headered Value

 

78-usecase5

C. Value Constants:

  • 1 Field = FieldEncodingType (representing a specific constant value for a particular datatype stored represented an FieldEncodingType op code)
Use Case 6. 8-bit Value Constants

85-usecase6

D. Null ByteArrays

 

Use Case 7. Null-valued reference to a ByteArray

 

92-usecase7

E. Short Length ByteArrays

  • 0, 1, 2, 3 bytes in length
Use Case 8. 0-byte ByteArray

 

100-usecase8

Use Case 9. 1-byte ByteArray
105-usecase9
Use Case 10. 2-byte Byte Array
110-usecase10
Use Case 11. 3-byte Byte Array
115-usecase11

 

Best regards,

Michael Herman (Toronto/Calgary/Seattle)

1 Comment

Filed under Uncategorized

One response to “SerentityData: Variable-byte, Statistically-based Entity Serialization & Field Encoding

  1. Pingback: Refactoring UBL 2.2 business documents for enacting business processes on the blockchain [WIP] | hyperonomy.com - digital intelligence

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s