Internet-Draft M. Stecher Expires: April, 2003 webwasher.com October, 2002 LateClearance Content Encoding draft-stecher-lclr-encoding-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire in April, 2003. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This document introduces a new content encoding that can be used in HTTP/1.1. Its purpose is to solve the download progress indication problem that some proxy gateway filters like virus scanners have. Stecher Expires April, 2003 [Page 1] Internet-Draft LateClearance Content Encoding October 2002 Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . 4 4 The LateClearance Content Encoding . . . . . . . . . . . . . 4 4.1 Indicating the Usage of LateClearance Encoding . . . . . . . 5 4.2 Content-Length or Chunked Transfer Encoding . . . . . . . . 5 4.3 Usage and Implementation . . . . . . . . . . . . . . . . . . 6 4.4 Structure of the LateClearance Encoding . . . . . . . . . . 6 4.4.1 The Header Atom . . . . . . . . . . . . . . . . . . . . . . 7 4.4.2 The Payload Atom . . . . . . . . . . . . . . . . . . . . . . 7 4.4.3 The Clearance Atom . . . . . . . . . . . . . . . . . . . . . 8 4.4.4 The Error Atom . . . . . . . . . . . . . . . . . . . . . . . 8 4.4.5 The Progress Atom . . . . . . . . . . . . . . . . . . . . . 9 4.4.6 The Block Padding Atom . . . . . . . . . . . . . . . . . . . 9 4.4.7 The Byte Padding atom . . . . . . . . . . . . . . . . . . . 10 5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1 The Header Atom . . . . . . . . . . . . . . . . . . . . . . 10 5.2.1 A Payload Atom With One Block of Data . . . . . . . . . . . 10 5.2.2 A Payload Atom With two Blocks of Data . . . . . . . . . . . 11 5.2.3 A Payload Atom With 466 Blocks of Data . . . . . . . . . . . 11 5.3 A Clearance Atom . . . . . . . . . . . . . . . . . . . . . . 11 5.4.1 An Error Atom Without Header and Body . . . . . . . . . . . 11 5.4.2 An Error Message With Header and Body . . . . . . . . . . . 11 5.5 Progress Atom . . . . . . . . . . . . . . . . . . . . . . . 11 5.6 Block Padding . . . . . . . . . . . . . . . . . . . . . . . 12 5.7 Byte Padding . . . . . . . . . . . . . . . . . . . . . . . . 12 5.8 A Complete File . . . . . . . . . . . . . . . . . . . . . . 12 6 Security Considerations . . . . . . . . . . . . . . . . . . 13 7 References . . . . . . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 13 Full Copyright Statement . . . . . . . . . . . . . . . . . . 14 Stecher Expires April, 2003 [Page 2] Internet-Draft LateClearance Content Encoding October 2002 1 Introduction Today, antivirus tools are the most wanted type of filter on a gateway proxy server level. Latency becomes an important factor when filtering algorithms are applied to HTTP data going from an Internet server to the requesting browser. Traditional virus scanners require that the complete file is seen before they can start scanning the HTTP data; however, this is the worst possible scenario for keeping the latency time short. Problems arise due to the fact that the proxy server has to download the complete file first, which can take a long time if the data comes from the origin Internet server over a slow connection. A proxy server is unable to forward any chunks of data until the filter decides whether the data is clean or infected. The time span increases even more when lengthy operations, such as archive extraction, have to be performed. The complete download and operation time can easily be greater than the time-out period of the requesting browser. Even if the browser keeps trying, the user is left without any indication as to the progress or status of the download. Currently only workarounds for this problem are known, such as letting some file types bypass the virus scanner, or forwarding a small percen- tage of the received data to the requesting client in order to provide at least some feedback and to keep the connection alive. However, this workaround has a number of disadvantages: the virus pattern may have already been forwarded in the first few percentages of the file, the estimated download time calculated by the client is incorrect, and if a virus were to be found, no error message could be displayed because the download was already started. This document introduces a new kind of content encoding, called LateClearance content encoding, which aims to: 1. Allow the downloading of large files by an HTTP client through a proxy server with antivirus or a similar function, while showing the user the (useful) progress of the download; 2. Prevent an infected file from being executed on the HTTP client, should a virus be detected after the file has been completely received by the proxy server; 3. Be able to display a virus alert message in this situation. Since browsers often can begin displaying some types of files (such as HTML files) while part of the file is still in transit, Late Clearance content encoding is not an optimal solution for these file types. Instead, it is intended for solving the problem of the large number of file types that do need to be downloaded completely before the HTTP client even can begin to use them. Stecher Expires April, 2003 [Page 3] Internet-Draft LateClearance Content Encoding October 2002 2 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. 3 Basic Concept The proposed content encoding can achieve its above-mentioned goals by allowing the gateway to forward all data at one time so that the HTTP client can make use of its normal download progress indication, but by transferring the content in an encrypted format so that it cannot be used without a decryption key. Only at the very end of the file, when all of the data has been collected at the gateway and the virus scanner has finished its work, will the decryption key be transferred to the HTTP client which can then decrypt the data. In the case of a virus being found, the key is not sent to the HTTP client but rather an error code and optional message which can be displayed to the user. The content encoding can also include additional information about the progress of post-processing of the data at the gateway (i.e. that extracting large archives at the gateway for virus scanning can take a long time). HTTP/1.1 allows an introduction of a new content encoding that is not yet specified in RFC 2616. See section 3.5 of [1]. 4 The LateClearance Content Encoding We are defining a new content encoding called "LateClearance", which is added to the content by the proxy server according to section 14.11 of [1]. A typical scenario is as follows: The original content is received by the proxy server. If the proxy server needs to receive the complete file before it can decide to either block or allow the data, it can check whether or not the HTTP client accepts LateClearance content encoding. If yes, it can apply the encoding to the content on-the-fly and simply forward the encrypted data at the same speed as it receives the original data. Stecher Expires April, 2003 [Page 4] Internet-Draft LateClearance Content Encoding October 2002 If the proxy server determines at the end that the file is allowed, it sends LateClearance.This means that LateClearance content encoding is usually not used by Internet HTTP servers, but by intermediary proxy servers that add this content encoding to the proxied data if necessary and allowed. LateClearance uses the AES (advanced encryption standard) Rijndael as the cipher algorithm (see [3] and [4]). 4.1 Indicating the Usage of LateClearance Encoding An HTTP client supporting LateClearance encoding SHOULD indicate this by adding "LateClearance" to the value of the "Accept-Encoding" header (see section 14.3 of [1]). Examples: Accept-Encoding: LateClearance Accept-Encoding: compress, gzip, LateClearance A proxy server adding the LateClearance Encoding MUST indicate this by adding "LateClearance" to the "Content-Encoding" header (see section 14.11 of [1]). Examples: Content-Encoding: LateClearance Content-Encoding: gzip, LateClearance 4.2 Content-Length or Chunked Transfer Encoding When using LateClearance content encoding, the amount of data will always be greater than the original, unencoded data, due to some meta data such as keys or error messages. The size required is usually unknown initially, since the proxy server does not know whether or not the file contains a virus (whether a key or an error message will be added). Therefore, a proxy MAY always use chunked transfer encoding (see 3.6.1 of [1] for details). In this case the original content length has to be removed from the HTTP header, making the HTTP client no longer capable of calculating an estimated download time. To overcome this disadvantage, LateClearance content encoding can advertise the expected payload length in the header atom of the body (see 4.4.1). Premium support of LateClearance content encoding by the HTTP client honors the payload length field in the header atom, counts the trans- ferred payload bytes and shows the download progress according to these values. A second progress bar shows the progress of the data postpro- cessing by the gateway filter according to the values of the progress atom (4.4.5). Stecher Expires April, 2003 [Page 5] Internet-Draft LateClearance Content Encoding October 2002 On the other hand, a basic LateClearance implementation at the HTTP client will not be able to display this extended dialog, but has to use the standard features that only know about the HTTP header's Content- Length field and the overall transferred body data. If an HTTP client is not expected to support the payload length field of the header atom, implementors of LateClearance encoding are not en- couraged to use chunked transfer encoding, but to keep an existing Content-Length header of the original data and to add a maximum needed overhead size to this content length. LateClearance content encoding contains methods for including padding so that the advertised content length can always be reached. See the padding atoms (4.4.6 and 4.4.7) for more information about how to maintain the Content-Length header. 4.3 Usage and Implementation The data received by the proxy server has to be put into 16-byte blocks. Any number of 16-byte blocks can then immediately be encrypted using the Rijndael algorithm ([3]). The final chunk of data MUST be padded with zero values if the remainder is less than 16 bytes. The encryption key MUST be a random value which is calculated in a way that a possible attacker cannot reproduce. A new key has to be used for each file transferred. The basic concept is such that, should a virus be detected and the key not sent at the end of a file, the encrypted data can not be decrypted with justifiable effort. The Rijndael encryption algorithm MUST be initialized with a zero init vector and in CBC mode. The key can be any length allowed by Rijndael (16, 24 or 32 bytes). Due to the use of CBC mode, the implementing proxy server has to re- member the final result as well as any leftover data that did not make up a 16-byte block, after the processing of the received data and before more data is received. 4.4 Structure of the LateClearance Encoding LateClearance encoding consists of a list of atoms. There are eight different atom types defined by this specification. The first byte of each atom defines the atom type. The used data types UInt8, UInt16, UInt32 and UInt64 are the fixed- length unsigned integer variants with the corresponding bit length. All values of these data types MUST be written in network byte order. Stecher Expires April, 2003 [Page 6] Internet-Draft LateClearance Content Encoding October 2002 4.4.1 The Header Atom This atom starts with a byte value of 0x01. It is followed by a UInt32 constant value of 0x4C436C72 which can be read as ASCII string "LClr". The next two bytes are the version number: the first byte is the major version number and the second byte is the minor version number. The current version is 1.0, so the two bytes are 0x01 and 0x00. The last field of this atom is a UInt64 value, which gives the overall payload length of this message if known upfront. If the length is not known, the length field must be set to zero. The payload length can be easily calculated from the original advertised content length by rounding up its value to the next multiple of 16. The payload length indication MAY be used by a client for displaying a good progress dialog even if chunked encoding is being used (see section 4.2). This atom has a fixed length of 15 bytes. The header atom MUST always be the first atom of a file and there MUST be exactly one header atom, i.e. the first five bytes can be used to identify a LateClearance-encoded file. Syntax: struct HeaderAtom { UInt8 atomID = 0x01; UInt32 constID = 0x4C436C72; // 'LClr' UInt8 majorVersion = 0x01; UInt8 minorVersion = 0x00; UInt64 payloadLength; // MAY be zero, if not known }; 4.4.2 The Payload Atom The payload atom starts with a byte value of 0x02 followed by a UInt16 value, which gives the number of 16-byte blocks that follow immediately. These byte blocks are a part of the Rijndael-encrypted original content. A LateClearance-encoded message has one or more payload atoms. Syntax: struct PayloadAtom { UInt8 atomID = 0x02; UInt16 byteBlockCount; UInt8[16][byteBlockCount]; // Rijndael encrypted data }; Stecher Expires April, 2003 [Page 7] Internet-Draft LateClearance Content Encoding October 2002 4.4.3 The Clearance Atom At the end of all payload atoms the gateway filter decides whether to block or allow the file. If the data is allowed, this clearance atom includes the key for decoding the data of the payload atoms. The clearance atom is marked by its first byte value of 0x03, directly followed by a UInt64 value that gives the original content length. This information is important due to the possible padding in the last 16-byte block of encoded data. The next UInt16 value is the key length that has been used to encode the data. After this, the key itself is printed in this atom. A LateClearance version 1.0 implementation SHOULD use a 16-byte key length but an HTTP client that includes a version 1.0 LateClearance decoder MUST also handle keys with length 24 and 32. A LateClearance message contains zero or one clearance atom which MUST be sent after all payload atoms. Syntax: struct ClearanceAtom { UInt8 atomID = 0x03; UInt64 contentLength; UInt16 keyLength; // SHOULD be 16 UInt8[keyLength] key; // Key for decryption }; 4.4.4 The Error Atom In case the file contains a virus or is to be blocked for other reasons, an error atom indicates that no key for decoding the payload will be sent. The encoded data MUST be deleted and SHOULD be replaced by an optional error message. The first byte that identifies an error atom has a value of 0x04. It is followed by an UInt16 value that gives the HTTP error code that SHOULD be used (typically 403). The next two UInt16 values contain the length of the optional HTTP header fields and the length of the optional error message body. If these values are greater than zero, the corresponding header and body bytes follow. If headers are given they MUST be separated by CR LF and an empty line MUST be added at the end. The optional headers MAY include the content type header and value of the error message. An implementing client MUST do its own error indications if no error message body is included in this atom and it MAY decide not to show an included body if implementation restrictions prevent this, e.g. because the originally indicated content type of the encoded message can no longer be changed. A LateClearance message contains zero or one error atom which MUST be sent after the last payload atom. A message MUST either contain a clearance or an error atom. Stecher Expires April, 2003 [Page 8] Internet-Draft LateClearance Content Encoding October 2002 Syntax: struct ErrorAtom { UInt8 atomID = 0x04; UInt16 httpErrorCode; UInt16 httpHeaderLen; UInt16 httpBodyLen; UInt8[httpHeaderLen] httpHeaders; UInt8[httpBodyLen] httpBody; }; 4.4.5 The Progress Atom The progress indicated by this atom is only for the post-processing of the data at the gateway. Although it is an optional atom, it SHOULD be used if the time between the last payload atom and the clearance or error atom is not very short, e.g. because a large archive needs to be extracted first. The atom starts with a byte value of 0x05, and is followed by a UInt16 value which gives the progress. A value of zero means 0%, 0x7FFF stands for 50% and 0xFFFF means 100%. Any number of progress atoms are allowed in the message. A client MAY skip these atoms; their support is optional. struct ProgressAtom { UInt8 atomID = 0x05; UInt16 progress; // 0 = 0%, 0xFFFF = 100% }; 4.4.6 The Block Padding Atom If the original data is received at the gateway with a Content-Length header, an implementation SHOULD try to keep it to achieve a good and simple progress dialog at the HTTP client. The usage of the progress atom with chunked encoding is more elegant, but its support by the HTTP client is not required. If a gateway wants to use the Content-Length header, it has to increase its value because a LateClearance-encoded message is always bigger than the original message. Unfortunately, an exact message usually cannot be calculated because it is not known whether a clearance atom or an error atom will be used at the end, how many payload atoms (each of which have their own amount of overhead) will be sent and if/how many progress atoms the gateway wants to generate. By preparing for limiting the number of payload and progress atoms, the gateway can calculate an upper limit for the content length. This means that at the end of the message there will usually be fewer bytes being transferred than indicated by the Content-Length header. In order to keep a persistent connection between HTTP client and gateway alive, the missing bytes MUST be transferred. Stecher Expires April, 2003 [Page 9] Internet-Draft LateClearance Content Encoding October 2002 This can be done by using the padding atoms. This padding atom allows unused data of any length to be sent, so long as there is a minimum of 3 bytes (due to its own overhead). It starts with the byte value of 0x06, followed by a UInt16 value for the number of following padding bytes which MUST be set to all zero values. A length value of zero is allowed; the size of this atom is length value plus 3. A LateClearance-encoded message MAY include any number of block padding atoms, not only at the end but in any order with other atoms. A client MUST simply skip these atoms. Syntax: struct Padding1Atom { UInt8 atomID = 0x06; UInt16 paddingLen; UInt8[paddingLen] paddingBytes; // MUST be all zero values }; 4.4.7 The Byte Padding Atom In case less than 3 bytes of padding are needed, this atom can be used which consists only of the byte value of 0x07 (it is a single byte atom). A LateClearance-encoded message MAY include any number of byte padding atoms, not only at the end but in any order with other atoms. A client MUST simply skip these atoms. Syntax: struct Padding2Atom { UInt8 atomID = 0x07; }; 5 Examples The examples are given in hex dump and ASCII form. A non-printable ASCII character will be written as a dot. 5.1 The Header Atom The advertised content length is 5876 bytes in this example. 01 4C 43 6C 72 01 00 00 00 00 .LClr..... 00 00 00 16 F2 ..... 5.2.1 A payload atom with one block of data 02 00 01 41 42 43 44 45 46 47 ...ABCDEFG 48 49 4A 4B 4C 4D 4E 4F 50 HIJKLMNOP Stecher Expires April, 2003 [Page 10] Internet-Draft LateClearance Content Encoding October 2002 5.2.2 A Payload Atom With Two Blocks of Data 02 00 02 41 42 43 44 45 46 47 ...ABCDEFG 48 49 4A 4B 4C 4D 4E 4F 50 61 HIJKLMNOPa 62 63 64 65 66 67 68 69 6A 6B bcdefghijk 6C 6D 6E 6F 70 lmnop 5.2.3 A Payload Atom With 466 Blocks of Data 02 01 D2 [7456 bytes following] .......... 5.3 A Clearance Atom Original content length is 7514 bytes, key len is 16 bytes 03 00 00 00 00 00 00 1D 5A 00 ........Z. 10 54 68 69 73 41 53 74 75 70 .ThisIsASt 69 64 4B 65 79 upidKey 5.4.1 An Error Atom Without Header and Body Error code is 403 04 01 93 00 00 00 00 ....... 5.4.2 An Error Message With Header and Body This message includes a short HTML error message. 04 01 93 00 00 00 00 43 6F 6E .......Con 74 65 6E 74 54 79 70 65 3A 20 tentType: 74 65 78 74 2F 68 74 6D 6C 0D text/html. 0A 43 6F 6E 74 65 6E 74 2D 4C .Content-L 65 6E 67 74 68 3A 20 32 34 0D ength: 24. 0A 0D 0A 3C 68 74 6D 6C 3E 56 ...V 69 72 75 73 20 66 6F 75 6E 64 irus found 3C 2F 68 74 6D 6C 3E 5.5 Progress Atom 42% of the message has been processed. 05 6B 84 ... Stecher Expires April, 2003 [Page 11] Internet-Draft LateClearance Content Encoding October 2002 5.6 Block Padding 11 zero bytes will be transferred, i.e. 14 bytes atom length 06 00 0B 00 00 00 00 00 00 00 .......... 00 00 00 00 .... 5.7 Byte Padding Three atoms in a row, ignore all. 07 07 07 ... 5.8 A Complete File This example encodes the plain text file "This is a sample text" which is of course virus-free and therefore comes with a clearance atom. The key used for encryption is "ABCDEFGHIJKLMNOP" (which is a bad key in terms of security). The content length is 21 which required 2 blocks of 16 encoded bytes, which results in an advertised payload length of 32. At the end the message has some padding to reach a content length of 90 bytes that was promised as content length in this example. 01 4C 43 6C 72 01 00 00 00 00 .LClr..... Header atom 00 00 00 00 20 02 00 02 71 99 .......... Payload atom start 9A C1 DB 63 C3 0A 1C C0 53 42 ...c....SB two blocks of 10 D8 B5 23 EA A2 D2 EB 22 A3 ...#....". encrypted data 49 E5 37 3D 99 5E 4C C3 E0 76 I.7=.^L..v till end of this line 03 00 00 00 00 00 00 00 15 00 .......... Clearance, 21 bytes 10 41 42 43 44 45 46 47 48 49 .ABCDEFGHI content length, 16 bytes 4A 4B 4C 4D 4E 4F 50 06 00 0A JKLMNOP... key. Padding atom with 00 00 00 00 00 00 00 00 00 00 .......... 11 zeros at the end Stecher Expires April, 2003 [Page 12] Internet-Draft LateClearance Content Encoding October 2002 6 Security Considerations LateClearance encoding has been designed to prevent accidental download of infected files that a gateway virus scanner is able to detect. If an attacker already has malicious code on a machine that is to continue downloading more infected content, there are many different ways to attempt this. Hacking a LateClearance encrypted message that includes an error atom but not the clearance atom, (so that the key is not available and needs to be hacked to decrypt the content), would not be an easy task. Using the AES Rijndael [3] and preparing for different key lengths ensures up-to-date security with a cipher algorithm. The weak part of this encoding approach is the random encryption key. If the key has been calculated with a poorly designed random generator, an attacker could know the key even if it is not transferred. If an attacker downloads a lot of clean files through a gateway, he receives a list of keys which can help find out the random algorithm if it is poorly designed. The design of this key generator is beyond the scope of this document. 7 References [1] Fielding, R., et. al., "Hypertext Transfer Protocol -- HTTP/1.1", Request for Comments 2616, June 1999. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", Request for Comments 2119, Harvard University, March 1997. [3] Daemen, J. and Rijmen, V., "The Design of Rijndael", Springer- Verlag, ISBN 3-540-42580-2, 2001. [4] The AES home page "http://csrc.nist.gov/encryption/aes/" Authors' Address Martin Stecher martin.stecher@webwasher.com Soeren Mueller soeren.mueller@webwasher.com webwasher.com AG Vattmannstr. 3 33100 Paderborn Germany Stecher Expires April, 2003 [Page 13] Internet-Draft LateClearance Content Encoding October 2002 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Stecher Expires April, 2003 [Page 14]