archive-gr.com » GR » T » TEIPIR.GR

Total: 878

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".

  • 001 Priority 000 Routine The use of the Delay Throughput and Reliability indications may increase the cost in some sense of the service In many networks better performance for one of these parameters is coupled with worse performance on another Except for very unusual cases at most two of these three indications should be set The type of service is used to specify the treatment of the datagram during its transmission through the internet system Example mappings of the internet type of service to the actual service provided on networks such as AUTODIN II ARPANET SATNET and PRNET is given in Service Mappings 8 Page 12 September 1981 Internet Protocol Specification The Network Control precedence designation is intended to be used within a network only The actual use and control of that designation is up to each network The Internetwork Control designation is intended for use by gateway control originators only If the actual use of these precedence designations is of concern to a particular network it is the responsibility of that network to control the access to and use of those precedence designations Total Length 16 bits Total Length is the length of the datagram measured in octets including internet header and data This field allows the length of a datagram to be up to 65 535 octets Such long datagrams are impractical for most hosts and networks All hosts must be prepared to accept datagrams of up to 576 octets whether they arrive whole or in fragments It is recommended that hosts only send datagrams larger than 576 octets if they have assurance that the destination is prepared to accept the larger datagrams The number 576 is selected to allow a reasonable sized data block to be transmitted in addition to the required header information For example this size allows a data block of 512 octets plus 64 header octets to fit in a datagram The maximal internet header is 60 octets and a typical internet header is 20 octets allowing a margin for headers of higher level protocols Identification 16 bits An identifying value assigned by the sender to aid in assembling the fragments of a datagram Flags 3 bits Various Control Flags Bit 0 reserved must be zero Bit 1 DF 0 May Fragment 1 Don t Fragment Bit 2 MF 0 Last Fragment 1 More Fragments 0 1 2 D M 0 F F Fragment Offset 13 bits This field indicates where in the datagram this fragment belongs Page 13 September 1981 Internet Protocol Specification The fragment offset is measured in units of 8 octets 64 bits The first fragment has offset zero Time to Live 8 bits This field indicates the maximum time the datagram is allowed to remain in the internet system If this field contains the value zero then the datagram must be destroyed This field is modified in internet header processing The time is measured in units of seconds but since every module that processes a datagram must decrease the TTL by at least one even if it process the datagram in less than a second the TTL must be thought of only as an upper bound on the time a datagram may exist The intention is to cause undeliverable datagrams to be discarded and to bound the maximum datagram lifetime Protocol 8 bits This field indicates the next level protocol used in the data portion of the internet datagram The values for various protocols are specified in Assigned Numbers 9 Header Checksum 16 bits A checksum on the header only Since some header fields change e g time to live this is recomputed and verified at each point that the internet header is processed The checksum algorithm is The checksum field is the 16 bit one s complement of the one s complement sum of all 16 bit words in the header For purposes of computing the checksum the value of the checksum field is zero This is a simple to compute checksum and experimental evidence indicates it is adequate but it is provisional and may be replaced by a CRC procedure depending on further experience Source Address 32 bits The source address See section 3 2 Destination Address 32 bits The destination address See section 3 2 Page 14 September 1981 Internet Protocol Specification Options variable The options may appear or not in datagrams They must be implemented by all IP modules host and gateways What is optional is their transmission in any particular datagram not their implementation In some environments the security option may be required in all datagrams The option field is variable in length There may be zero or more options There are two cases for the format of an option Case 1 A single octet of option type Case 2 An option type octet an option length octet and the actual option data octets The option length octet counts the option type octet and the option length octet as well as the option data octets The option type octet is viewed as having 3 fields 1 bit copied flag 2 bits option class 5 bits option number The copied flag indicates that this option is copied into all fragments on fragmentation 0 not copied 1 copied The option classes are 0 control 1 reserved for future use 2 debugging and measurement 3 reserved for future use Page 15 September 1981 Internet Protocol Specification The following internet options are defined CLASS NUMBER LENGTH DESCRIPTION 0 0 End of Option list This option occupies only 1 octet it has no length octet 0 1 No Operation This option occupies only 1 octet it has no length octet 0 2 11 Security Used to carry Security Compartmentation User Group TCC and Handling Restriction Codes compatible with DOD requirements 0 3 var Loose Source Routing Used to route the internet datagram based on information supplied by the source 0 9 var Strict Source Routing Used to route the internet datagram based on information supplied by the source 0 7 var Record Route Used to trace the route an internet datagram takes 0 8 4 Stream ID Used to carry the stream identifier 2 4 var Internet Timestamp Specific Option Definitions End of Option List 00000000 Type 0 This option indicates the end of the option list This might not coincide with the end of the internet header according to the internet header length This is used at the end of all options not the end of each option and need only be used if the end of the options would not otherwise coincide with the end of the internet header May be copied introduced or deleted on fragmentation or for any other reason Page 16 September 1981 Internet Protocol Specification No Operation 00000001 Type 1 This option may be used between options for example to align the beginning of a subsequent option on a 32 bit boundary May be copied introduced or deleted on fragmentation or for any other reason Security This option provides a way for hosts to send security compartmentation handling restrictions and TCC closed user group parameters The format for this option is as follows 10000010 00001011 SSS SSS CCC CCC HHH HHH TCC Type 130 Length 11 Security S field 16 bits Specifies one of 16 levels of security eight of which are reserved for future use 00000000 00000000 Unclassified 11110001 00110101 Confidential 01111000 10011010 EFTO 10111100 01001101 MMMM 01011110 00100110 PROG 10101111 00010011 Restricted 11010111 10001000 Secret 01101011 11000101 Top Secret 00110101 11100010 Reserved for future use 10011010 11110001 Reserved for future use 01001101 01111000 Reserved for future use 00100100 10111101 Reserved for future use 00010011 01011110 Reserved for future use 10001001 10101111 Reserved for future use 11000100 11010110 Reserved for future use 11100010 01101011 Reserved for future use Page 17 September 1981 Internet Protocol Specification Compartments C field 16 bits An all zero value is used when the information transmitted is not compartmented Other values for the compartments field may be obtained from the Defense Intelligence Agency Handling Restrictions H field 16 bits The values for the control and release markings are alphanumeric digraphs and are defined in the Defense Intelligence Agency Manual DIAM 65 19 Standard Security Markings Transmission Control Code TCC field 24 bits Provides a means to segregate traffic and define controlled communities of interest among subscribers The TCC values are trigraphs and are available from HQ DCA Code 530 Must be copied on fragmentation This option appears at most once in a datagram Loose Source and Record Route 10000011 length pointer route data Type 131 The loose source and record route LSRR option provides a means for the source of an internet datagram to supply routing information to be used by the gateways in forwarding the datagram to the destination and to record the route information The option begins with the option type code The second octet is the option length which includes the option type code and the length octet the pointer octet and length 3 octets of route data The third octet is the pointer into the route data indicating the octet which begins the next source address to be processed The pointer is relative to this option and the smallest legal value for the pointer is 4 A route data is composed of a series of internet addresses Each internet address is 32 bits or 4 octets If the pointer is greater than the length the source route is empty and the recorded route full and the routing is to be based on the destination address field Page 18 September 1981 Internet Protocol Specification If the address in destination address field has been reached and the pointer is not greater than the length the next address in the source route replaces the address in the destination address field and the recorded route address replaces the source address just used and pointer is increased by four The recorded route address is the internet module s own internet address as known in the environment into which this datagram is being forwarded This procedure of replacing the source route with the recorded route though it is in the reverse of the order it must be in to be used as a source route means the option and the IP header as a whole remains a constant length as the datagram progresses through the internet This option is a loose source route because the gateway or host IP is allowed to use any route of any number of other intermediate gateways to reach the next address in the route Must be copied on fragmentation Appears at most once in a datagram Strict Source and Record Route 10001001 length pointer route data Type 137 The strict source and record route SSRR option provides a means for the source of an internet datagram to supply routing information to be used by the gateways in forwarding the datagram to the destination and to record the route information The option begins with the option type code The second octet is the option length which includes the option type code and the length octet the pointer octet and length 3 octets of route data The third octet is the pointer into the route data indicating the octet which begins the next source address to be processed The pointer is relative to this option and the smallest legal value for the pointer is 4 A route data is composed of a series of internet addresses Each internet address is 32 bits or 4 octets If the pointer is greater than the length the source route is empty and the Page 19 September 1981 Internet Protocol Specification recorded route full and the routing is to be based on the destination address field If the address in destination address field has been reached and the pointer is not greater than the length the next address in the source route replaces the address in the destination address field and the recorded route address replaces the source address just used and pointer is increased by four The recorded route address is the internet module s own internet address as known in the environment into which this datagram is being forwarded This procedure of replacing the source route with the recorded route though it is in the reverse of the order it must be in to be used as a source route means the option and the IP header as a whole remains a constant length as the datagram progresses through the internet This option is a strict source route because the gateway or host IP must send the datagram directly to the next address in the source route through only the directly connected network indicated in the next address to reach the next gateway or host specified in the route Must be copied on fragmentation Appears at most once in a datagram Record Route 00000111 length pointer route data Type 7 The record route option provides a means to record the route of an internet datagram The option begins with the option type code The second octet is the option length which includes the option type code and the length octet the pointer octet and length 3 octets of route data The third octet is the pointer into the route data indicating the octet which begins the next area to store a route address The pointer is relative to this option and the smallest legal value for the pointer is 4 A recorded route is composed of a series of internet addresses Each internet address is 32 bits or 4 octets If the pointer is Page 20 September 1981 Internet Protocol Specification greater than the length the recorded route data area is full The originating host must compose this option with a large enough route data area to hold all the address expected The size of the option does not change due to adding addresses The intitial contents of the route data area must be zero When an internet module routes a datagram it checks to see if the record route option is present If it is it inserts its own internet address as known in the environment into which this datagram is being forwarded into the recorded route begining at the octet indicated by the pointer and increments the pointer by four If the route data area is already full the pointer exceeds the length the datagram is forwarded without inserting the address into the recorded route If there is some room but not enough room for a full address to be inserted the original datagram is considered to be in error and is discarded In either case an ICMP parameter problem message may be sent to the source host 3 Not copied on fragmentation goes in first fragment only Appears at most once in a datagram Stream Identifier 10001000 00000010 Stream ID Type 136 Length 4 This option provides a way for the 16 bit SATNET stream identifier to be carried through networks that do not support the stream concept Must be copied on fragmentation Appears at most once in a datagram Page 21 September 1981 Internet Protocol Specification Internet Timestamp 01000100 length pointer oflw flg internet address timestamp Type 68 The Option Length is the number of octets in the option counting the type length pointer and overflow flag octets maximum length 40 The Pointer is the number of octets from the beginning of this option to the end of timestamps plus one i e it points to the octet beginning the space for next timestamp The smallest legal value is 5 The timestamp area is full when the pointer is greater than the length The Overflow oflw 4 bits is the number of IP modules that cannot register timestamps due to lack of space The Flag flg 4 bits values are 0 time stamps only stored in consecutive 32 bit words 1 each timestamp is preceded with internet address of the registering entity 3 the internet address fields are prespecified An IP module only registers its timestamp if it matches its own address with the next specified internet address The Timestamp is a right justified 32 bit timestamp in milliseconds since midnight UT If the time is not available in milliseconds or cannot be provided with respect to midnight UT then any time may be inserted as a timestamp provided the high order bit of the timestamp field is set to one to indicate the use of a non standard value The originating host must compose this option with a large enough timestamp data area to hold all the timestamp information expected The size of the option does not change due to adding Page 22 September 1981 Internet Protocol Specification timestamps The intitial contents of the timestamp data area must be zero or internet address zero pairs If the timestamp data area is already full the pointer exceeds the length the datagram is forwarded without inserting the timestamp but the overflow count is incremented by one If there is some room but not enough room for a full timestamp to be inserted or the overflow count itself overflows the original datagram is considered to be in error and is discarded In either case an ICMP parameter problem message may be sent to the source host 3 The timestamp option is not copied upon fragmentation It is carried in the first fragment Appears at most once in a datagram Padding variable The internet header padding is used to ensure that the internet header ends on a 32 bit boundary The padding is zero 3 2 Discussion The implementation of a protocol must be robust Each implementation must expect to interoperate with others created by different individuals While the goal of this specification is to be explicit about the protocol there is the possibility of differing interpretations In general an implementation must be conservative in its sending behavior and liberal in its receiving behavior That is it must be careful to send well formed datagrams but must accept

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc0791.txt (2016-02-14)
    Open archived version from archive



  • the receiver will simply see two back to back END characters which will generate a bad IP packet If the SLIP implementation does not throw away the zero length IP packet the IP implementation certainly will If there was line noise the data received due to it will be discarded without affecting the following packet Because there is no standard SLIP specification there is no real defined maximum packet size for SLIP It is probably best to accept the maximum packet size used by the Berkeley UNIX SLIP drivers 1006 bytes including the IP and transport protocol headers not including the framing characters Therefore any new SLIP implementations should be prepared to accept 1006 byte datagrams and should not send more than 1006 bytes in a datagram DEFICIENCIES There are several features that many users would like SLIP to provide which it doesn t In all fairness SLIP is just a very simple protocol designed quite a long time ago when these problems were not really important issues The following are commonly perceived shortcomings in the existing SLIP protocol addressing both computers in a SLIP link need to know each other s IP addresses for routing purposes Also when using SLIP for hosts to dial up a router the addressing scheme may be quite dynamic and the router may need to inform the dialing host of Romkey Page 2 RFC 1055 Serial Line IP June 1988 the host s IP address SLIP currently provides no mechanism for hosts to communicate addressing information over a SLIP connection type identification SLIP has no type field Thus only one protocol can be run over a SLIP connection so in a configuration of two DEC computers running both TCP IP and DECnet there is no hope of having TCP IP and DECnet share one serial line between them while using SLIP While SLIP is Serial Line IP if a serial line connects two multi protocol computers those computers should be able to use more than one protocol over the line error detection correction noisy phone lines will corrupt packets in transit Because the line speed is probably quite low likely 2400 baud retransmitting a packet is very expensive Error detection is not absolutely necessary at the SLIP level because any IP application should detect damaged packets IP header and UDP and TCP checksums should suffice although some common applications like NFS usually ignore the checksum and depend on the network media to detect damaged packets Because it takes so long to retransmit a packet which was corrupted by line noise it would be efficient if SLIP could provide some sort of simple error correction mechanism of its own compression because dial in lines are so slow usually 2400bps packet compression would cause large improvements in packet throughput Usually streams of packets in a single TCP connection have few changed fields in the IP and TCP headers so a simple compression algorithms might just send the changed parts of the headers instead of the complete

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc1055.txt (2016-02-14)
    Open archived version from archive


  • send datagram locally to GatewayTo IP destination If the sending host is itself a member of the destination group a copy of the outgoing datagram must be looped back for local delivery unless inhibited by the sender Level 2 implementations only A host group address should not be placed in the source address field or anywhere in a source routing option of an outgoing IP datagram 6 3 Extensions to the Local Network Service Interface No change to the local network service interface is required to support the sending of multicast IP datagrams The IP module merely specifies an IP host group destination rather than an individual IP destination when it invokes the existing Send Local operation 6 4 Extensions to an Ethernet Local Network Module The Ethernet directly supports the sending of local multicast packets by allowing multicast addresses in the destination field of Ethernet packets All that is needed to support the sending of multicast IP datagrams is a procedure for mapping IP host group addresses to Ethernet multicast addresses An IP host group address is mapped to an Ethernet multicast address by placing the low order 23 bits of the IP address into the low order 23 bits of the Ethernet multicast address 01 00 5E 00 00 00 hex Because there are 28 significant bits in an IP host group address more than one host group address may map to the same Ethernet multicast address 6 5 Extensions to Local Network Modules other than Ethernet Other networks that directly support multicasting such as rings or buses conforming to the IEEE 802 2 standard may be handled the same way as Ethernet for the purpose of sending multicast IP datagrams For a network that supports broadcast but not multicast such as the Experimental Ethernet all IP host group addresses may be mapped to a single local broadcast address at the cost of increased overhead on all local hosts For a point to point link joining two hosts or a Deering Page 6 RFC 1054 Host Extensions for IP Multicasting May 1988 host and a multicast router multicasts should be transmitted exactly like unicasts For a store and forward network like the ARPANET or a public X 25 network all IP host group addresses might be mapped to the well known local address of an IP multicast router a router on such a network would take responsibility for completing multicast delivery within the network as well as among networks 7 RECEIVING MULTICAST IP DATAGRAMS 7 1 Extensions to the IP Service Interface Incoming multicast IP datagrams are received by upper layer protocol modules using the same Receive IP operation as normal unicast datagrams Selection of a destination upper layer protocol is based on the protocol field in the IP header regardless of the destination IP address However before any datagrams destined to a particular group can be received an upper layer protocol must ask the IP module to join that group Thus the IP service interface must be extended to provide two new operations JoinHostGroup group address interface LeaveHostGroup group address interface The JoinHostGroup operation requests that this host become a member of the host group identified by group address on the given network interface The LeaveGroup operation requests that this host give up its membership in the host group identified by group address on the given network interface The interface argument may be omitted on hosts that may be attached to only one network For hosts that may be attached to more than one network the upper layer protocol may choose to leave the interface unspecified in which case the request will apply to the default interface for sending multicast datagrams see section 6 1 It is permissible to join the same group on more than one interface in which case duplicate multicast datagrams may be received It is also permissible for more than one upper layer protocol to request membership in the same group Both operations should return immediately i e they are non blocking operations indicating success or failure Either operation may fail due to an invalid group address or interface identifier JoinHostGroup may fail due to lack of local resources LeaveHostGroup may fail because the host does not belong to the given group on the given interface LeaveHostGroup may succeed but the membership persist if more than one upper layer protocol has requested membership in the same group Deering Page 7 RFC 1054 Host Extensions for IP Multicasting May 1988 7 2 Extensions to the IP Module To support the reception of multicast IP datagrams the IP module must be extended to maintain a list of host group memberships associated with each network interface An incoming datagram destined to one of those groups is processed exactly the same way as datagrams destined to one of the host s individual addresses Incoming datagrams destined to groups to which the host does not belong are discarded without generating any error report On hosts attached to more than one network if a datagram arrives via one network interface destined for a group to which the host belongs only on a different interface the datagram is quietly discarded These cases should occur only as a result of inadequate multicast address filtering in a local network module An incoming datagram is not rejected for having an IP time to live of 1 i e the time to live should not automatically be decremented on arriving datagrams that are not being forwarded An incoming datagram is not rejected for having an IP host group address in its source address field or anywhere in a source routing option An ICMP error message Destination Unreachable Time Exceeded Parameter Problem Source Quench or Redirect is never generated in response to a datagram destined to an IP host group The list of host group memberships is updated in response to JoinHostGroup and LeaveHostGroup requests from upper layer protocols Each membership should have an associated reference count or similar mechanism to handle multiple requests to join and leave the same group On the first request to join and the last request to leave a group on a given interface the local network module for that interface is notified so that it may update its multicast reception filter see section 7 3 The IP module must also be extended to implement the IGMP protocol specified in Appendix I IGMP is used to keep neighboring multicast routers informed of the host group memberships present on a particular local network To support IGMP every level 2 host must join the all hosts group address 224 0 0 1 on each network interface at initialization time and must remain a member for as long as the host is active Datagrams addressed to the all hosts group are recognized as a special case by the multicast routers and are never forwarded beyond a single network regardless of their time to live Thus the all hosts address may not be used as an internet wide broadcast address For the purpose of IGMP membership in the all hosts group is really necessary only while the host belongs to at least one other group Deering Page 8 RFC 1054 Host Extensions for IP Multicasting May 1988 However it is specified that the host shall remain a member of the all hosts group at all times because 1 it is simpler 2 the frequency of reception of unnecessary IGMP queries should be low enough that overhead is negligible and 3 the all hosts address may serve other routing oriented purposes such as advertising the presence of gateways or resolving local addresses 7 3 Extensions to the Local Network Service Interface Incoming local network multicast packets are delivered to the IP module using the same Receive Local operation as local network unicast packets To allow the IP module to tell the local network module which multicast packets to accept the local network service interface is extended to provide two new operations JoinLocalGroup group address LeaveLocalGroup group address where group address is an IP host group address The JoinLocalGroup operation requests the local network module to accept and deliver up subsequently arriving packets destined to the given IP host group address The LeaveLocalGroup operation requests the local network module to stop delivering up packets destined to the given IP host group address The local network module is expected to map the IP host group addresses to local network addresses as required to update its multicast reception filter Any local network module is free to ignore LeaveLocalGroup requests and may deliver up packets destined to more addresses than just those specified in JoinLocalGroup requests if it is unable to filter incoming packets adequately The local network module must not deliver up any multicast packets that were transmitted from that module loopback of multicasts is handled at the IP layer or higher 7 4 Extensions to an Ethernet Local Network Module To support the reception of multicast IP datagrams an Ethernet module must be able to receive packets addressed to the Ethernet multicast addresses that correspond to the host s IP host group addresses It is highly desirable to take advantage of any address filtering capabilities that the Ethernet hardware interface may have so that the host receives only those packets that are destined to it Unfortunately many current Ethernet interfaces have a small limit on the number of addresses that the hardware can be configured to recognize Nevertheless an implementation must be capable of Deering Page 9 RFC 1054 Host Extensions for IP Multicasting May 1988 listening on an arbitrary number of Ethernet multicast addresses which may mean opening up the address filter to accept all multicast packets during those periods when the number of addresses exceeds the limit of the filter For interfaces with inadequate hardware address filtering it may be desirable for performance reasons to perform Ethernet address filtering within the software of the Ethernet module This is not mandatory however because the IP module performs its own filtering based on IP destination addresses 7 5 Extensions to Local Network Modules other than Ethernet Other multicast networks such as IEEE 802 2 networks can be handled the same way as Ethernet for the purpose of receiving multicast IP datagrams For pure broadcast networks such as the Experimental Ethernet all incoming broadcast packets can be accepted and passed to the IP module for IP level filtering On point to point or store and forward networks multicast IP datagrams will arrive as local network unicasts so no change to the local network module should be necessary APPENDIX I INTERNET GROUP MANAGEMENT PROTOCOL IGMP The Internet Group Management Protocol IGMP is used by IP hosts to report their host group memberships to any immediately neighboring multicast routers IGMP is an asymmetric protocol and is specified here from the point of view of a host rather than a multicast router IGMP may also be used symmetrically or asymmetrically between multicast routers Such use is not specified here Like ICMP IGMP is a integral part of IP It is required to be implemented by all hosts conforming to level 2 of the IP multicasting specification IGMP messages are encapsulated in IP datagrams with an IP protocol number of 2 All IGMP messages of concern to hosts have the following format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Version Type Unused Checksum Group Address Deering Page 10 RFC 1054 Host Extensions for IP Multicasting May 1988 Version This memo specifies version 1 of IGMP Version 0 is specified in RFC 988 and is now obsolete Type There are two types of IGMP message of concern to hosts 1 Host Membership Query 2 Host Membership Report Unused Unused field zeroed when sent ignored when received Checksum The checksum is the 16 bit one s complement of the one s complement sum of the 8 octet IGMP message For computing the checksum the checksum field is zeroed Group Address In a Host Membership Query message the group address field is zeroed when sent ignored when received In a Host Membership Report message the group address field holds the IP host group address of the group being reported Informal Protocol Description Multicast routers send Host Membership Query messages hereinafter called Queries to discover which host groups have members on their attached local networks Queries are addressed to the all hosts group address 224 0 0 1 and carry an IP time to live of 1 Hosts respond to a Query by generating Host Membership Reports hereinafter called Reports reporting each host group to which they belong on the network interface from which the Query was received In order to avoid an implosion of concurrent Reports and to reduce the total number of Reports transmitted two techniques are used 1 When a host receives a Query rather than sending Reports immediately it starts a report delay timer for each of its group memberships on the network interface of the incoming Query Each timer is set to a different randomly chosen value between zero and D seconds When a timer expires a Deering Page 11 RFC 1054 Host Extensions for IP Multicasting May 1988 Report is generated for the corresponding host group Thus Reports are spread out over a D second interval instead of all occurring at once 2 A Report is sent with an IP destination address equal to the host group address being reported and with an IP time to live of 1 so that other members of the same group on the same network can overhear the Report If a host hears a Report for a group to which it belongs on that network the host stops its own timer for that group and does not generate a Report for that group Thus in the normal case only one Report will be generated for each group present on the network by the member host whose delay timer expires first Note that the multicast routers receive all IP multicast datagrams and therefore need not be addressed explicitly Further note that the routers need not know which hosts belong to a group only that at least one host belongs to a group on a particular network There are two exceptions to the behavior described above First if a report delay timer is already running for a group membership when a Query is received that timer is not reset to a new random value but rather allowed to continue running with its current value Second a report delay timer is never set for a host s membership in the all hosts group 224 0 0 1 and that membership is never reported If a host uses a pseudo random number generator to compute the reporting delays one of the host s own individual IP address should be used as part of the seed for the generator to reduce the chance of multiple hosts generating the same sequence of delays A host should confirm that a received Report has the same IP host group address in its IP destination field and its IGMP group address field to ensure that the host s own Report is not cancelled by an erroneous received Report A host should quietly discard any IGMP message of type other than Host Membership Query or Host Membership Report Multicast routers send Queries periodically to refresh their knowledge of memberships present on a particular network If no Reports are received for a particular group after some number of Queries the routers assume that that group has no local members and that they need not forward remotely originated multicasts for that group onto the local network Queries are normally sent infrequently no more than once a minute so as to keep the IGMP overhead on hosts and networks very low However when a multicast router starts up it may issue several closely space Queries in order to quickly build up its knowledge of local memberships Deering Page 12 RFC 1054 Host Extensions for IP Multicasting May 1988 When a host joins a new group it should immediately transmit a Report for that group rather than waiting for a Query in case it is the first member of that group on the network To cover the possibility of the initial Report being lost or damaged it is recommended that it be repeated once or twice after short delays A simple way to accomplish this is to act as if a Query had been received for that group only setting the group s random report delay timer The state transition diagram below illustrates this approach Note that on a network with no multicast routers present the only IGMP traffic is the one or more Reports sent whenever a host joins a new group State Transition Diagram IGMP behavior is more formally specified by the state transition diagram below A host may be in one of three possible states with respect to any single IP host group on any single network interface Non Member state when the host does not belong to the group on the interface This is the initial state for all memberships on all network interfaces it requires no storage in the host Delaying Member state when the host belongs to the group on the interface and has a report delay timer running for that membership Idle Member state when the host belongs to the group on the interface and does not have a report delay timer running for that membership There are five significant events that can cause IGMP state transitions join group occurs when the host decides to join the group on the interface It may occur only in the Non Member state leave group occurs when the host decides to leave the group on the interface It may occur only in the Delaying Member and Idle Member states query received occurs when the host

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc1054.txt (2016-02-14)
    Open archived version from archive


  • data in the vendor information area The rest of this field should be filled with PAD zero octets Variable Length Data The variable length data has a single format it consists of one tag octet one length octet and length octets of data Gateway Field Tag 3 Data N address bytes Specifies the IP addresses of N 4 gateways for this subnet If one of many gateways is preferred that should be first Time Server Field Tag 4 Data N address bytes Specifies the IP addresses of N 4 time servers RFC 868 IEN 116 Name Server Field Tag 5 Data N address bytes Specifies the IP addresses of N 4 name servers IEN 116 Domain Name Server Field Tag 6 Data N address bytes Specifies the IP addresses of N 4 domain name servers RFC 883 Log Server Field Tag 7 Data N address bytes Specifies the IP addresses of N 4 MIT LCS UDP log server LOGGING Prindeville Page 3 RFC 1048 BOOTP Extensions February 1988 Cookie Quote Server Field Tag 8 Data N address bytes Specifies the IP addresses of N 4 Quote of the Day servers RFC 865 LPR Server Field Tag 9 Data N address bytes Specifies the IP addresses of N 4 Berkeley 4BSD printer servers LPD Impress Server Field Tag 10 Data N address bytes Specifies the IP addresses of N 4 Impress network image servers IMAGEN RLP Server Field Tag 11 Data N address bytes Specifies the IP addresses of N 4 Resource Location Protocol RLP servers RFC 887 Hostname Tag 12 Data N bytes of hostname Specifies the name of the client The name may or may not domain qualified this is a site specific issue Reserved Fields Tag 128 254 Data N bytes of undefined content Specifies additional site specific information to be interpreted on an implementation specific basis This should follow all data with the preceding generic tags 0 127 Extensions Additional generic data fields may be registered by contacting Joyce K Reynolds USC Information Sciences Institute 4676 Admiralty Way Marina del Rey California 90292 6695 or by E mail as JKREYNOLDS ISI EDU nic handle JKR1 Implementation specific use of undefined generic types those in the range 12 127 may conflict with other implementations and registration is required Prindeville Page 4 RFC 1048 BOOTP Extensions February 1988 When selecting information to put into the vendor specific area care should be taken to not exceed the 64 byte length restriction Nonessential information such as host name and quote of the day server may be excluded which may later be located with a more appropriate service protocol such as RLP or the WKS resource type of the domain name system Indeed even RLP servers may be discovered using a broadcast request to locate a local RLP server Comparison to Alternative Approaches Extending BOOTP to provide more configuration information than the minimum required by boot PROMs may not be necessary Rather than having each module in a host e g the

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc1048.txt (2016-02-14)
    Open archived version from archive


  • itself is sufficient to prevent SWS and thus protect a host from a foreign implementation which has failed to deal properly with this problem The two algorithms taken together produce an additional reduction in CPU consumption observed in practice to be as high as a factor of four 4 Improved Window Algorithms The receiver of data can take a very simple step to eliminate SWS When it disposes of a small amount of data it can artificially reduce the offered window in subsequent acknowledgements so that the useable window computed by the sender does not permit the sending of any further data At some later time when the receiver has processed a substantially larger amount of incoming data the artificial limitation on the offered window can be removed all at once so that the sender computes a sudden large jump rather than a sequence of small jumps in the useable window At this level the algorithm is quite simple but in order to determine exactly when the window should be opened up again it is necessary to look at some of the other details of the implementation 7 Depending on whether the window is held artificially closed for a short or long time two problems will develop The one we have already discussed never closing the window artificially will lead to SWS On the other hand if the window is only opened infrequently the pipeline of data in the network between the sender and the receiver may have emptied out while the sender was being held off so that a delay is introduced before additional data arrives from the sender This delay does reduce throughput but it does not consume network resources or CPU resources in the process as does SWS Thus it is in this direction that one ought to overcompensate For a simple implementation a rule of thumb that seems to work in practice is to artificially reduce the offered window until the reduction constitutes one half of the available space at which point increase the window to advertise the entire space again In any event one ought to make the chunk by which the window is opened at least permit one reasonably large segment If the receiver is so short of buffers that it can never advertise a large enough buffer to permit at least one large segment it is hopeless to expect any sort of high throughput There is an algorithm that the sender can use to achieve the same effect described above a very simple and elegant rule first described by Michael Greenwald at MIT The sender of the data uses the offered window to compute a useable window and then compares the useable window to the offered window and refrains from sending anything if the ratio of useable to offered is less than a certain fraction Clearly if the computed useable window is small compared to the offered window this means that a substantial amount of previously sent information is still 8 in the pipeline from the sender to the receiver which in turn means that the sender can count on being granted a larger useable window in the future Until the useable window reaches a certain amount the sender should simply refuse to send anything Simple experiments suggest that the exact value of the ratio is not very important but that a value of about 25 percent is sufficient to avoid SWS and achieve reasonable throughput even for machines with a small offered window An additional enhancement which might help throughput would be to attempt to hold off sending until one can send a maximum size segment Another enhancement would be to send anyway even if the ratio is small if the useable window is sufficient to hold the data available up to the next push point This algorithm at the sender end is very simple Notice that it is not necessary to set a timer to protect against protocol lockup when postponing the send operation Further acknowledgements as they arrive will inevitably change the ratio of offered to useable window To see this note that when all the data in the catanet pipeline has arrived at the receiver the resulting acknowledgement must yield an offered window and useable window that equal each other If the expected acknowledgements do not arrive the retransmission mechanism will come into play to assure that something finally happens Thus to add this algorithm to an existing TCP implementation usually requires one line of code As part of the send algorithm it is already necessary to compute the useable window from the offered window It is a simple matter to add a line of code which if the ratio is less than a certain 9 percent sets the useable window to zero The results of SWS are so devastating that no sender should be without this simple piece of insurance 5 Improved Acknowledgement Algorithms In the beginning of this paper an overly simplistic implementation of TCP was described which led to SWS One of the characteristics of this implementation was that the recipient of data sent a separate acknowledgement for every segment that it received This compulsive acknowledgement was one of the causes of SWS because each acknowledgement provided some new useable window but even if one of the algorithms described above is used to eliminate SWS overly frequent acknowledgement still has a substantial problem which is that it greatly increases the processing time at the sender s end Measurement of TCP implementations especially on large operating systems indicate that most of the overhead of dealing with a segment is not in the processing at the TCP or IP level but simply in the scheduling of the handler which is required to deal with the segment A steady dribble of acknowledgements causes a high overhead in scheduling with very little to show for it This waste is to be avoided if possible There are two reasons for prompt acknowledgement One is to prevent retransmission We will discuss later how to determine whether unnecessary retransmission is occurring The other reason one acknowledges promptly is to permit further data to be sent However the previous section makes quite clear that it is not always desirable to send a little bit of data even though the receiver may have room for 10 it Therefore one can state a general rule that under normal operation the receiver of data need not and for efficiency reasons should not acknowledge the data unless either the acknowledgement is intended to produce an increased useable window is necessary in order to prevent retransmission or is being sent as part of a reverse direction segment being sent for some other reason We will consider an algorithm to achieve these goals Only the recipient of the data can control the generation of acknowledgements Once an acknowledgement has been sent from the receiver back to the sender the sender must process it Although the extra overhead is incurred at the sender s end it is entirely under the receiver s control Therefore we must now describe an algorithm which occurs at the receiver s end Obviously the algorithm must have the following general form sometimes the receiver of data upon processing a segment decides not to send an acknowledgement now but to postpone the acknowledgement until some time in the future perhaps by setting a timer The peril of this approach is that on many large operating systems it is extremely costly to respond to a timer event almost as costly as to respond to an incoming segment Clearly if the receiver of the data in order to avoid extra overhead at the sender end spends a great deal of time responding to timer interrupts no overall benefit has been achieved for efficiency at the sender end is achieved by great thrashing at the receiver end We must find an algorithm that avoids both of these perils The following scheme seems a good compromise The receiver of data 11 will refrain from sending an acknowledgement under certain circumstances in which case it must set a timer which will cause the acknowledgement to be sent later However the receiver should do this only where it is a reasonable guess that some other event will intervene and prevent the necessity of the timer interrupt The most obvious event on which to depend is the arrival of another segment So if a segment arrives postpone sending an acknowledgement if both of the following conditions hold First the push bit is not set in the segment since it is a reasonable assumption that there is more data coming in a subsequent segment Second there is no revised window information to be sent back This algorithm will insure that the timer although set is seldom used The interval of the timer is related to the expected inter segment delay which is in turn a function of the particular network through which the data is flowing For the Arpanet a reasonable interval seems to be 200 to 300 milliseconds Appendix A describes an adaptive algorithm for measuring this delay The section on improved window algorithms described both a receiver algorithm and a sender algorithm and suggested that both should be used The reason for this is now clear While the sender algorithm is extremely simple and useful as insurance the receiver algorithm is required in order that this improved acknowledgement strategy work If the receipt of every segment causes a new window value to be returned then of necessity an acknowledgement will be sent for every data segment When according to the strategy of the previous section the 12 receiver determines to artificially reduce the offered window that is precisely the circumstance under which an acknowledgement need not be sent When the receiver window algorithm and the receiver acknowledgement algorithm are used together it will be seen that sending an acknowledgement will be triggered by one of the following events First a push bit has been received Second a temporary pause in the data stream is detected Third the offered window has been artificially reduced to one half its actual value In the beginning of this section it was pointed out that there are two reasons why one must acknowledge data Our consideration at this point has been concerned only with the first that an acknowledgement must be returned as part of triggering the sending of new data It is also necessary to acknowledge whenever the failure to do so would trigger retransmission by the sender Since the retransmission interval is selected by the sender the receiver of the data cannot make a precise determination of when the acknowledgement must be sent However there is a rough rule the sender can use to avoid retransmission provided that the receiver is reasonably well behaved We will assume that sender of the data uses the optional algorithm described in the TCP specification in which the roundtrip delay is measured using an exponential decay smoothing algorithm Retransmission of a segment occurs if the measured delay for that segment exceeds the smoothed average by some factor To see how retransmission might be triggered one must consider the pattern of segment arrivals at the receiver The goal of our strategy was that the sender should send off 13 a number of segments in close sequence and receive one acknowledgement for the whole burst The acknowledgement will be generated by the receiver at the time that the last segment in the burst arrives at the receiver To ensure the prompt return of the acknowledgement the sender could turn on the push bit in the last segment of the burst The delay observed at the sender between the initial transmission of a segment and the receipt of the acknowledgement will include both the network transit time plus the holding time at the receiver The holding time will be greatest for the first segments in the burst and smallest for the last segments in the burst Thus the smoothing algorithm will measure a delay which is roughly proportional to the average roundtrip delay for all the segments in the burst Problems will arise if the average delay is substantially smaller than the maximum delay and the smoothing algorithm used has a very small threshold for triggering retransmission The widest variation between average and maximum delay will occur when network transit time is negligible and all delay is processing time In this case the maximum will be twice the average by simple algebra so the threshold that controls retransmission should be somewhat more than a factor of two In practice retransmission of the first segments of a burst has not been a problem because the delay measured consists of the network roundtrip delay as well as the delay due to withholding the acknowledgement and the roundtrip tends to dominate except in very low roundtrip time situations such as when sending to one s self for test purposes This low roundtrip situation can be covered very simply by including a minimum value below which the roundtrip estimate is not permitted to drop 14 In our experiments with this algorithm retransmission due to faulty calculation of the roundtrip delay occurred only once when the parameters of the exponential smoothing algorithm had been misadjusted so that they were only taking into account the last two or three segments sent Clearly this will cause trouble since the last two or three segments of any burst are the ones whose holding time at the receiver is minimal so the resulting total estimate was much lower than appropriate Once the parameters of the algorithm had been adjusted so that the number of segments taken into account was approximately twice the number of segments in a burst of average size with a threshold factor of 1 5 no further retransmission has ever been identified due to this problem including when sending to ourself and when sending over high delay nets 6 Conservative Vs Optimistic Windows According to the TCP specification the offered window is presumed to have some relationship to the amount of data which the receiver is actually prepared to receive However it is not necessarily an exact correspondence We will use the term conservative window to describe the case where the offered window is precisely no larger than the actual buffering available The drawback to conservative window algorithms is that they can produce very low throughput in long delay situations It is easy to see that the maximum input of a conservative window algorithm is one bufferfull every roundtrip delay in the net since the next bufferfull cannot be launched until the updated window acknowledgement information from the previous transmission has made the roundtrip 15 In certain cases it may be possible to increase the overall throughput of the transmission by increasing the offered window over the actual buffer available at the receiver Such a strategy we will call an optimistic window strategy The optimistic strategy works if the network delivers the data to the recipient sufficiently slowly that it can process the data fast enough to keep the buffer from overflowing If the receiver is faster than the sender one could with luck permit an infinitely optimistic window in which the sender is simply permitted to send full speed If the sender is faster than the receiver however and the window is too optimistic then some segments will cause a buffer overflow and will be discarded Therefore the correct strategy to implement an optimistic window is to increase the window size until segments start to be lost This only works if it is possible to detect that the segment has been lost In some cases it is easy to do because the segment is partially processed inside the receiving host before it is thrown away In other cases overflows may actually cause the network interface to be clogged which will cause the segments to be lost elsewhere in the net It is inadvisable to attempt an optimistic window strategy unless one is certain that the algorithm can detect the resulting lost segments However the increase in throughput which is possible from optimistic windows is quite substantial Any systems with small buffer space should seriously consider the merit of optimistic windows The selection of an appropriate window algorithm is actually more complicated than even the above discussion suggests The following considerations are not presented with the intention that they be 16 incorporated in current implementations of TCP but as background for the sophisticated designer who is attempting to understand how his TCP will respond to a variety of networks with different speed and delay characteristics The particular pattern of windows and acknowledgements sent from receiver to sender influences two characteristics of the data being sent First they control the average data rate Clearly the average rate of the sender cannot exceed the average rate of the receiver or long term buffer overflow will occur Second they influence the burstiness of the data coming from the sender Burstiness has both advantages and disadvantages The advantage of burstiness is that it reduces the CPU processing necessary to send the data This follows from the observed fact especially on large machines that most of the cost of sending a segment is not the TCP or IP processing but the scheduling overhead of getting started On the other hand the disadvantage of burstiness is that it may cause buffers to overflow either in the eventual recipient which was discussed above or in an intermediate gateway a problem ignored in this paper The algorithms described above attempts to strike a balance between excessive burstiness which in the extreme cases can cause delays because a burst is not requested soon enough and excessive fragmentation of the data stream into small segments which we identified as Silly Window Syndrome Under conditions of extreme delay in the network none of the algorithms described above will achieve adequate throughput Conservative window algorithms have a predictable throughput limit 17 which is one windowfull per roundtrip delay Attempts to solve

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc0813.txt (2016-02-14)
    Open archived version from archive


  • fragment first the first octet of the fragment and fragment last the last octet of the fragment 1 Select the next hole descriptor from the hole descriptor list If there are no more entries go to step eight 2 If fragment first is greater than hole last go to step one 3 If fragment last is less than hole first go to step one If either step two or step three is true then the newly arrived fragment does not overlap with the hole in any way so we need pay no further attention to this hole We return to the beginning of the algorithm where we select the next hole for examination 4 Delete the current entry from the hole descriptor list Since neither step two nor step three was true the newly arrived fragment does interact with this hole in some way Therefore the current descriptor will no longer be valid We will destroy it and in the next two steps we will determine whether or not it is necessary to create any new hole descriptors 5 If fragment first is greater than hole first then create a new hole descriptor new hole with new hole first equal to hole first and new hole last equal to fragment first minus one 5 If the test in step five is true then the first part of the original hole is not filled by this fragment We create a new descriptor for this smaller hole 6 If fragment last is less than hole last and fragment more fragments is true then create a new hole descriptor new hole with new hole first equal to fragment last plus one and new hole last equal to hole last This test is the mirror of step five with one additional feature Initially we did not know how long the reassembled datagram would be and therefore we created a hole reaching from zero to infinity Eventually we will receive the last fragment of the datagram At this point that hole descriptor which reaches from the last octet of the buffer to infinity can be discarded The fragment which contains the last fragment indicates this fact by a flag in the internet header called more fragments The test of this bit in this statement prevents us from creating a descriptor for the unneeded hole which describes the space from the end of the datagram to infinity 7 Go to step one 8 If the hole descriptor list is now empty the datagram is now complete Pass it on to the higher level protocol processor for further handling Otherwise return 4 Part Two Managing the Hole Descriptor List The main complexity in the eight step algorithm above is not performing the arithmetical tests but in adding and deleting entries from the hole descriptor list One could imagine an implementation in which the storage management package was many times more complicated than the rest of the algorithm since there is no specified upper limit on the number

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc0815.txt (2016-02-14)
    Open archived version from archive


  • mechanisms by which this table can be filled in For example if the net is a broadcast net such as an ethernet or a ringnet every gateway may simply broadcast such a table from time to time and the host need do nothing but listen to obtain the required information Alternatively the network may provide the mechanism of logical addressing by which a whole set of machines can be provided with a single group address to which a request can be sent for assistance Failing those two schemes the host can build up its table of neighbor gateways by remembering all the gateways from which it has ever received a message Finally in certain cases it may be necessary for this table or at least the initial entries in the table to be constructed manually by a manager or operator at the site In cases where the network in question provides absolutely no support for this kind of host query at least some manual intervention will be required to get started so that the host can find out about at least one gateway 4 Host Algorithms for Fault Isolation We now return to the question raised above What strategy should the host use to detect that it is talking to a dead gateway so that it can know to switch to some other gateway in the list In fact there are several algorithms which can be used All are reasonably simple to implement but they have very different implications for the overhead on the host the gateway and the network Thus to a certain extent the algorithm picked must depend on the details of the network and of the host 6 1 NETWORK LEVEL DETECTION Many networks particularly the Arpanet perform precisely the required function internal to the network If a host sends a datagram to a dead gateway on the Arpanet the network will return a host dead message which is precisely the information the host needs to know in order to switch to another gateway Some early implementations of Internet on the Arpanet threw these messages away That is an exceedingly poor idea 2 CONTINUOUS POLLING The ICMP protocol provides an echo mechanism by which a host may solicit a response from a gateway A host could simply send this message at a reasonable rate to assure itself continuously that the gateway was still up This works but since the message must be sent fairly often to detect a fault in a reasonable time it can imply an unbearable overhead on the host itself the network and the gateway This strategy is prohibited except where a specific analysis has indicated that the overhead is tolerable 3 TRIGGERED POLLING If the use of polling could be restricted to only those times when something seemed to be wrong then the overhead would be bearable Provided that one can get the proper advice from one s higher level protocols it is possible to implement such a strategy For example one could program the TCP level so that whenever it retransmitted a 7 segment more than once it sent a hint down to the IP layer which triggered polling This strategy does not have excessive overhead but does have the problem that the host may be somewhat slow to respond to an error since only after polling has started will the host be able to confirm that something has gone wrong and by then the TCP above may have already timed out Both forms of polling suffer from a minor flaw Hosts as well as gateways respond to ICMP echo messages Thus polling cannot be used to detect the error that a foreign address thought to be a gateway is actually a host Such a confusion can arise if the physical addresses of machines are rearranged 4 TRIGGERED RESELECTION There is a strategy which makes use of a hint from a higher level as did the previous strategy but which avoids polling altogether Whenever a higher level complains that the service seems to be defective the Internet layer can pick the next gateway from the list of available gateways and switch to it Assuming that this gateway is up no real harm can come of this decision even if it was wrong for the worst that will happen is a redirect message which instructs the host to return to the gateway originally being used If on the other hand the original gateway was indeed down then this immediately provides a new route so the period of time until recovery is shortened This last strategy seems particularly clever and is probably the most generally suitable for those cases where the network itself does not provide fault isolation Regretably I have forgotten who suggested this idea to me It is not my invention 8 5 Higher Level Fault Detection The previous discussion has concentrated on fault detection and recovery at the IP layer This section considers what the higher layers such as TCP should do TCP has a single fault recovery action it repeatedly retransmits a segment until either it gets an acknowledgement or its connection timer expires As discussed above it may use retransmission as an event to trigger a request for fault recovery to the IP layer In the other direction information may flow up from IP reporting such things as ICMP Destination Unreachable or error messages from the attached network The only subtle question about TCP and faults is what TCP should do when such an error message arrives or its connection timer expires The TCP specification discusses the timer In the description of the open call the timeout is described as an optional value that the client of TCP may specify if any segment remains unacknowledged for this period TCP should abort the connection The default for the timeout is 30 seconds Early TCPs were often implemented with a fixed timeout interval but this did not work well in practice as the following discussion may suggest Clients of TCP can

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc0816.txt (2016-02-14)
    Open archived version from archive


  • the machine entirely on to a separate processor dedicated to this kind of task Such a machine is often described as a communications processor or a front end processor There are several advantages to this approach First the operating system on the communications processor can be tailored for precisely this kind of task This makes the job of implementation much easier Second one does not need to redo the task for every machine to which the protocol is to be added It may be possible to reuse the same front end machine on different host computers Since the task need not be done as many times one might hope that more attention could be paid to doing it right Given a careful implementation in an environment which is optimized for this kind of task the resulting package should turn out to be very efficient Unfortunately there are also problems with this approach There is of course a financial problem associated with buying an additional computer In many cases this is not a problem at all since the cost is negligible compared to what the programmer would cost to do the job in the mainframe itself More 9 fundamentally the communications processor approach does not completely sidestep any of the problems raised above The reason is that the communications processor since it is a separate machine must be attached to the mainframe by some mechanism Whatever that mechanism code is required in the mainframe to deal with it It can be argued that the program to deal with the communications processor is simpler than the program to implement the entire protocol package Even if that is so the communications processor interface package is still a protocol in nature with all of the same structural problems Thus all of the issues raised above must still be faced In addition to those problems there are some other more subtle problems associated with an outboard implementation of a protocol We will return to these problems later There is a way of attaching a communications processor to a mainframe host which sidesteps all of the mainframe implementation problems which is to use some preexisting interface on the host machine as the port by which a communications processor is attached This strategy is often used as a last stage of desperation when the software on the host computer is so intractable that it cannot be changed in any way Unfortunately it is almost inevitably the case that all of the available interfaces are totally unsuitable for this purpose so the result is unsatisfactory at best The most common way in which this form of attachment occurs is when a network connection is being used to mimic local teletypes In this case the front end processor can be attached to the mainframe by simply providing a number of wires out of the front end processor each corresponding to a connection which are 10 plugged into teletype ports on the mainframe computer Because of the appearance of the physical configuration which results from this arrangement Michael Padlipsky has described this as the milking machine approach to computer networking This strategy solves the immediate problem of providing remote access to a host but it is extremely inflexible The channels being provided to the host are restricted by the host software to one purpose only remote login It is impossible to use them for any other purpose such as file transfer or sending mail so the host is integrated into the network environment in an extremely limited and inflexible manner If this is the best that can be done then it should be tolerated Otherwise implementors should be strongly encouraged to take a more flexible approach 4 Protocol Layering The previous discussion suggested that there was a decision to be made as to where a protocol ought to be implemented In fact the decision is much more complicated than that for the goal is not to implement a single protocol but to implement a whole family of protocol layers starting with a device driver or local network driver at the bottom then IP and TCP and eventually reaching the application specific protocol such as Telnet FTP and SMTP on the top Clearly the bottommost of these layers is somewhere within the kernel since the physical device driver for the net is almost inevitably located there Equally clearly the top layers of this package which provide the user his ability to perform the remote login function or to send mail are not entirely contained within the kernel Thus the question is not 11 whether the protocol family shall be inside or outside the kernel but how it shall be sliced in two between that part inside and that part outside Since protocols come nicely layered an obvious proposal is that one of the layer interfaces should be the point at which the inside and outside components are sliced apart Most systems have been implemented in this way and many have been made to work quite effectively One obvious place to slice is at the upper interface of TCP Since TCP provides a bidirectional byte stream which is somewhat similar to the I O facility provided by most operating systems it is possible to make the interface to TCP almost mimic the interface to other existing devices Except in the matter of opening a connection and dealing with peculiar failures the software using TCP need not know that it is a network connection rather than a local I O stream that is providing the communications function This approach does put TCP inside the kernel which raises all the problems addressed above It also raises the problem that the interface to the IP layer can if the programmer is not careful become excessively buried inside the kernel It must be remembered that things other than TCP are expected to run on top of IP The IP interface must be made accessible even if TCP sits on top of it inside the kernel Another obvious place to slice is above Telnet The advantage of slicing above Telnet is that it solves the problem of having remote login channels emulate local teletype channels The disadvantage of putting Telnet into the kernel is that the amount of code which has now 12 been included there is getting remarkably large In some early implementations the size of the network package when one includes protocols at the level of Telnet rivals the size of the rest of the supervisor This leads to vague feelings that all is not right Any attempt to slice through a lower layer boundary for example between internet and TCP reveals one fundamental problem The TCP layer as well as the IP layer performs a demultiplexing function on incoming datagrams Until the TCP header has been examined it is not possible to know for which user the packet is ultimately destined Therefore if TCP as a whole is moved outside the kernel it is necessary to create one separate process called the TCP process which performs the TCP multiplexing function and probably all of the rest of TCP processing as well This means that incoming data destined for a user process involves not just a scheduling of the user process but scheduling the TCP process first This suggests an alternative structuring strategy which slices through the protocols not along an established layer boundary but along a functional boundary having to do with demultiplexing In this approach certain parts of IP and certain parts of TCP are placed in the kernel The amount of code placed there is sufficient so that when an incoming datagram arrives it is possible to know for which process that datagram is ultimately destined The datagram is then routed directly to the final process where additional IP and TCP processing is performed on it This removes from the kernel any requirement for timer based actions since they can be done by the process provided by the 13 user This structure has the additional advantage of reducing the amount of code required in the kernel so that it is suitable for systems where kernel space is at a premium The RFC 814 titled Names Addresses Ports and Routes discusses this rather orthogonal slicing strategy in more detail A related discussion of protocol layering and multiplexing can be found in Cohen and Postel 1 5 Breaking Down the Barriers In fact the implementor should be sensitive to the possibility of even more peculiar slicing strategies in dividing up the various protocol layers between the kernel and the one or more user processes The result of the strategy proposed above was that part of TCP should execute in the process of the user In other words instead of having one TCP process for the system there is one TCP process per connection Given this architecture it is not longer necessary to imagine that all of the TCPs are identical One TCP could be optimized for high throughput applications such as file transfer Another TCP could be optimized for small low delay applications such as Telnet In fact it would be possible to produce a TCP which was somewhat integrated with the Telnet or FTP on top of it Such an integration is extremely important for it can lead to a kind of efficiency which more traditional structures are incapable of producing Earlier this paper pointed out that one of the important rules to achieving efficiency was to send the minimum number of packets for a given amount of data The idea of protocol layering interacts very strongly and poorly with this 14 goal because independent layers have independent ideas about when packets should be sent and unless these layers can somehow be brought into cooperation additional packets will flow The best example of this is the operation of server telnet in a character at a time remote echo mode on top of TCP When a packet containing a character arrives at a server host each layer has a different response to that packet TCP has an obligation to acknowledge the packet Either server telnet or the application layer above has an obligation to echo the character received in the packet If the character is a Telnet control sequence then Telnet has additional actions which it must perform in response to the packet The result of this in most implementations is that several packets are sent back in response to the one arriving packet Combining all of these return messages into one packet is important for several reasons First of course it reduces the number of packets being sent over the net which directly reduces the charges incurred for many common carrier tariff structures Second it reduces the number of scheduling actions which will occur inside both hosts which as was discussed above is extremely important in improving throughput The way to achieve this goal of packet sharing is to break down the barrier between the layers of the protocols in a very restrained and careful manner so that a limited amount of information can leak across the barrier to enable one layer to optimize its behavior with respect to the desires of the layers above and below it For example it would represent an improvement if TCP when it received a packet could ask the layer above whether or not it would be worth pausing for a few milliseconds before sending an acknowledgement in order to see if the 15 upper layer would have any outgoing data to send Dallying before sending the acknowledgement produces precisely the right sort of optimization if the client of TCP is server Telnet However dallying before sending an acknowledgement is absolutely unacceptable if TCP is being used for file transfer for in file transfer there is almost never data flowing in the reverse direction and the delay in sending the acknowledgement probably translates directly into a delay in obtaining the next packets Thus TCP must know a little about the layers above it to adjust its performance as needed It would be possible to imagine a general purpose TCP which was equipped with all sorts of special mechanisms by which it would query the layer above and modify its behavior accordingly In the structures suggested above in which there is not one but several TCPs the TCP can simply be modified so that it produces the correct behavior as a matter of course This structure has the disadvantage that there will be several implementations of TCP existing on a single machine which can mean more maintenance headaches if a problem is found where TCP needs to be changed However it is probably the case that each of the TCPs will be substantially simpler than the general purpose TCP which would otherwise have been built There are some experimental projects currently under way which suggest that this approach may make designing of a TCP or almost any other layer substantially easier so that the total effort involved in bringing up a complete package is actually less if this approach is followed This approach is by no means generally accepted but deserves some consideration 16 The general conclusion to be drawn from this sort of consideration is that a layer boundary has both a benefit and a penalty A visible layer boundary with a well specified interface provides a form of isolation between two layers which allows one to be changed with the confidence that the other one will not stop working as a result However a firm layer boundary almost inevitably leads to inefficient operation This can easily be seen by analogy with other aspects of operating systems Consider for example file systems A typical operating system provides a file system which is a highly abstracted representation of a disk The interface is highly formalized and presumed to be highly stable This makes it very easy for naive users to have access to disks without having to write a great deal of software The existence of a file system is clearly beneficial On the other hand it is clear that the restricted interface to a file system almost inevitably leads to inefficiency If the interface is organized as a sequential read and write of bytes then there will be people who wish to do high throughput transfers who cannot achieve their goal If the interface is a virtual memory interface then other users will regret the necessity of building a byte stream interface on top of the memory mapped file The most objectionable inefficiency results when a highly sophisticated package such as a data base management package must be built on top of an existing operating system Almost inevitably the implementors of the database system attempt to reject the file system and obtain direct access to the disks They have sacrificed modularity for efficiency The same conflict appears in networking in a rather extreme form 17 The concept of a protocol is still unknown and frightening to most naive programmers The idea that they might have to implement a protocol or even part of a protocol as part of some application package is a dreadful thought And thus there is great pressure to hide the function of the net behind a very hard barrier On the other hand the kind of inefficiency which results from this is a particularly undesirable sort of inefficiency for it shows up among other things in increasing the cost of the communications resource used up to achieve the application goal In cases where one must pay for one s communications costs they usually turn out to be the dominant cost within the system Thus doing an excessively good job of packaging up the protocols in an inflexible manner has a direct impact on increasing the cost of the critical resource within the system This is a dilemma which will probably only be solved when programmers become somewhat less alarmed about protocols so that they are willing to weave a certain amount of protocol structure into their application program much as application programs today weave parts of database management systems into the structure of their application program An extreme example of putting the protocol package behind a firm layer boundary occurs when the protocol package is relegated to a front end processor In this case the interface to the protocol is some other protocol It is difficult to imagine how to build close cooperation between layers when they are that far separated Realistically one of the prices which must be associated with an implementation so physically modularized is that the performance will suffer as a result Of course a separate processor for protocols could be very closely integrated into 18 the mainframe architecture with interprocessor co ordination signals shared memory and similar features Such a physical modularity might work very well but there is little documented experience with this closely coupled architecture for protocol support 6 Efficiency of Protocol Processing To this point this document has considered how a protocol package should be broken into modules and how those modules should be distributed between free standing machines the operating system kernel and one or more user processes It is now time to consider the other half of the efficiency question which is what can be done to speed the execution of those programs that actually implement the protocols We will make some specific observations about TCP and IP and then conclude with a few generalities IP is a simple protocol especially with respect to the processing of normal packets so it should be easy to get it to perform efficiently The only area of any complexity related to actual packet processing has to do with fragmentation and reassembly The reader is referred to RFC 815 titled IP Datagram Reassembly Algorithms for specific consideration of this point Most costs in the IP layer come from table look up functions as opposed to packet processing functions An outgoing packet requires two translation functions to be performed The internet address must be translated to a target gateway

    Original URL path: http://web.teipir.gr/new/ecs/pelab_1/RFC/rfc0817.txt (2016-02-14)
    Open archived version from archive



  •