Interprocess communication (local)

These functions will appear first in the new Vertex-Amoeba kernel for fast and efficient communication between privileged system processes like device drivers. It's definitely not intented to be a replacement for Amoeba's generic RPC interface. Most servers on top of the kernel will furthermore use the generic RPC interface for their default interprocess communication to insure local independency for processes- one main goal of Amoeba. But there are cases, the RPC interface is to slow, or not flexible enough, and for example for device driver running in user space, the wrong choice. Device drivers are servers, too, but running always local on a specific machine. To support outsourcing of device drivers, protocol stacks or other less time crticial servers the new IPC interface was introduced. All servers inside the kernel and various new parts will support both interfaces: RPC and IPC. The IPC stub functions and data structures were specified very similar to the RPC ones. To make efficient transfer of complex data possible, there is significant performance improvement with transferring data vector objects. That means several buffers of different location and length can be transferred with one transaction or reply.
Furthermore, it's possible to share data segment memory objects to other processes. All processes wanting to share memory, must map this previously created shared segment. The IPC stubs are used for this case, too.

Programming Interface

Data structures

struct ipc_message{

portid_t portid; /* unique destination portid

*/

/* ** User parts */

long command; /* request command */ long offset; /* user 1 */ long size; /* user 2 */ long extra; /* user 3 */ long status; /* user 4 */

};

struct ipc_iovec{ char * base; /* start address of vm buffer */ ulong flen; /* length and flags of vm buffer

*/

};

IOV macros

IPC_IOVLEN(iov)

Get the length of the iovec entry

IPC_IOVFLAGS(iov)

Get the flags of the iovec entry

IPC_IOVBUF(iov)

- 4 -

Get the buffer addres of the iovec entry

IPC_SETIOVLEN(iov,len)

Set the length field of the iovec entry

IPC_SETIOVFLAGS(iov,flags)

Set the flags of the iovec field

IPC_SETIOV(iov,buf,len,flags)

Set the buffer start address, the length and the flags of the iovec entry.

IOV flags

IPC_COPY

Copy data buffers between IPC partners

IPC_SHARED

Translate shared data segment addresses between IPC partners

Fundamentals

1. All stub functions always return an error code, familar with error conditions found in
the stderr.h header. This error code show the success of the IPC operation, and is not
the status given by the server. The server side status is returned in msg.status.

2. All, the request and the reply buffers, must currently allocated by the user. For vector
objects, each vector element consists in the IPC_COPY case of a user allocated
memory buffer (stack, malloc,...), and the length specifies on the request side the
maximal size of the buffer, and for the reply the indead returned data length. In the
IPC_SHARED case, both processes wanting to share virtual memory must create a
shared segment . The client iovec virtual base address is given on the request side, and
the kernel IPC module will translate the virtual addres from the client to the server side,
and vice versa. The server got a translated iovec array.

3. If reply data is present from the IPC, the iovec structure contains in his length field the
total replied data size in the buffer.

4. For each IPC port, there is a private key capability for IPC port protection, given to
the IPC module with ipc_register. Only processes knowing the public keyport
capability can lookup the IPC port, else the lookup will be denied by the IPC module.

5. Portids are protected with a random generated private field. Only with ipc_lookup()
(and the right key cap) it's possible to get this security field. Each time a new port is
registered, the private field will be newly random generated.

Each server thread of a process wanting to receive messages from a specific port must register the ipc port with:

errstat

ipc_register(portid,portname,namelen,keycap) portid_p portid; char *portname;

- 5 -

int capability

namelen; *keycap;

The ipc module returns in portid a unique port identification number related with the port name given in portname with length namelen. The keycap capability protects the port name against unauthorized usage. A client process must have the same key capability (not only a restricted version from this) to lookup this port. If the port was already registered, always the same port id number will be returned.
Either on a server thread exit, or explicitly a registered port can be unregistered with:

errstatipc_unregister(portid) portid_p portid;

A client can translate a port name into a port id number with:

errstat

ipc_lookup(portname,plen,portid, protcap)

char int portid_p capability

*portname; len; portid; *protcap;

The len field contains the length of the portname string. The protection capability protcap must be the same as registered by the server (see above).

The message passing can take place in two ways:

1. bi-directional in RPC request-reply semantic

Client: ipc_trans
Server: ipc_getreq , ipc_putrep

2. uni-directional only from client to server

Client: ipc_send
Server: ipc_recv

REQUEST-REPLY IPC

A client thread can send a request-reply message with:

errstatipc_trans(reqmsg,reqiov,reqiovlen, repmsg,repiov,repiovlen) ipc_msg_p reqmsg; ipc_iovec_p reqiov; int reqiovlen; ipc_msg_p repmsg; ipc_iovec_p repiov; int repiovlen;

A message header consists of 2 parts:

1. portid

2. private data

- 6 -

Always both, request message header reqmsg, and the reply message header repmsg are required, but can (and for performance reasons should) be the same data structure. The request and reply data are packed in the io vector array iovec in the following way:

ipc_iovec_t iov[n];

IPC_SETIOV(iov[0],buf_1,buflen_1, [IPC_COPY] or [IPC_SHARED]);

... IPC_SETIOV(iov[n],buf_n,buflen_n, [IPC_COPY] or [IPC_SHARED]);

The reqiovlen and repiovlen fields contain the length of the iovec array iov, in this case n. For the case of non existing request and/or reply data, the length field must be zero.

On ther server side, there are two functions for the request and the reply:

errstatipc_getreq(reqmsg,reqiov,reqlen)

ipc_msg_p ipc_iovec_p int

errstat

reqmsg; reqiov; reqlen;

ipc_putrep(repmsg,repiov,replen) ipc_msg_p repmsg;

ipc_iovec_p int

repiov; replen;

Always both function must be used to service a client request. The request and reply message and the iovec array can be the same data. To extract the buffer address, the length and perhaps the flags from the iovec structure, there exist several macros:

long len; char *buf; long flags;

/* ** Length and buf start address of the i-th iovec */

len = IPC_IOVLEN(iov[i]); buf = (char *) IPC_IOVBUF(iov[i]);

/* ** optional the iov flags */

flags = IPC_IOVFLAGS(iov[i]);

RECEIVE-SEND IPC

The more simple case of sending and receiving an ipc message:

errstatipc_send(msg,iov,len) ipc_msg_p msg; ipc_iovec_p iov; int len;

errstat

ipc_recv(msg,iov,len) ipc_msg_p msg; ipc_iovec_p iov; int len;

- 7 -

The client sends a message with ipc_send, and the server waits for client messages with ipc_recv. The server must have previously registered his ipc port, or the receive will fail. The client must set the portid field in his msg structure to determine the server ipc port. The server portid must be retrieved with ipc_lookup.

Shared Data segments

To make IPC transfers much faster, client and server processes can share data segments. First, the server process creates a shareable data segment with

errstatipc_create_shared(segcap,vaddr,vsize) capability *segcap; long *vaddr; long vsize;

This function allocates the data segment of size vsize, and returns in vaddr the virtual start address and the segment capability segcap. The data segment is mapped in the server process address space with the MAP_SHARED flag.
Other client processes must get the segment capability to map the shared data segment in their virtual address space with:

errstatipc_map_shared(origcap,vaddr,vsize) capability *origcap; long *vaddr; long vsize;

The start address of the virtual mapped data segment is returned in vaddr. Usually, client processes send a request to the server to get the segment capability. All IPC transaction can use these shared data segments. In the iovec field, all processes must set the IPC_SHARED flag. However, each process uses his own virtual addresses for this shared data. The kernel IPC module is responsible to transform the virtual addresses during an IPC transaction between the processes.

Administration

On thread exit, the IPC module will check if the dying thread was a registered ipc server thread. The IPC module will do the necessary cleanup and unregister automatically the allocated ipc port slot. If this thread was the last one owning a registered port, the port name and id number will vanish from the port name table.

Warnings

Things that might unpleasantly surprise the user.

- 8 -

Examples

REQUEST-REPLY IPC (IPC_COPY)

A simple client-server example.

Server code

#include <amoeba.h> #include <sys/ipc.h>

#include <module/rnd.h> #include <module/name.h>

#define MYSERVER_PORTNAME "myserver" #define SERVERPATH "/tmp/myserver" #define CMD_HELLO 100

capability myportcap;

void server() { portid_t portid; ipc_msg_t msg; ipc_iovec_t iov;

errstat err; char buf[100];

/* ** Create port protection capability */

rnd_getrandom(&myportcap,sizeof(capability);

/* ** publish it in the file system */

name_append(SERVERPATH,&myportcap);

/* ** Okay, now register our port name */

err = ipc_register(&portid,MYSERVER_PORTNAME,&myportcap);

/* ** We're ready for incomming client requests. */

for(;;) { int replylen;

/* ** Setup iovec */

IPC_SETIOV(iov[0],buf,100,IPC_COPY);

msg.portid=portid;

- 9 -

err = ipc_getreq(msg,iov,1);

if(err != STD_OK) return;

replylen=0;

switch(msg.command) {

case CMD_HELLO: if(IPC_IOVLEN(iov[0]) > 0) printf("client msg: %s\n",buf);

msg.status = STD_OK,

strcpy(buf,"I've got the data.\n");

IPC_SETIOV(iov[0],buf, sizeof(buf),IPC_COPY);

replen = 1 ;

break;

default: msg.status = STD_ARGBAD; break; }

err = ipc_putrep(&msg,iov,replen);

}

Client code

#include <amoeba.h> #include <sys/ipc.h>

#include <module/name.h>

#define MYSERVER_PORTNAME "myserver"

"myserver"

#define SERVERPATH "/tmp/myserver"

#define CMD_HELLO 100

capability myportcap;

portid_t portid;

void client() { errstat err; ipc_msg_t msg; ipc_iovec_t iov[1];

char buf[100];

"/tmp/myserver"

100

- 10 -

char *p;

int

/*

len;

** Send a message to the server */

/* ** First lookup the his port id. We need the server port ** capability. */

name_lookup(SERVERPATH,&myportcap);

err = ipc_lookup(&portid,MYSERVER_PORTNAME,&myportcap);

if(err != STD_OK) { failwith ("client port id lookup failed (%s)", err_why(err)); }

strcpy(buf,"Hello, I'm there\n");

/* ** Setup the iovec data */

IPC_SETIOV( iov[0], buf, strlen(buf), IPC_COPY);

msg.portid = portid; msg.command = CMD_HELLO;

/* ** Request and reply message and iovec are both the same. */

err = ipc_trans(&msg,iov,1,&msg,iov,1);

if(err != STD_OK || msg.status != STD_OK) { failwith ("something went wrong"); }

/* ** Extract the reply data */

p = (char *) IPC_IOVBUF(iov[0]); len = (int) IPC_IOVLEN(iov[0]);

if(len > 0) printf("Reply: %s (length=%d)\n",p,len);

}

RECEIVE-SEND IPC

This code sample was taken from the ipc test suite program.

Server code

myserver.h:

- 11 -

#define PNMAE "MYSERVER" #define PCMD 100

#include "myserver.h"

capability servercap;

void server() {

portid_t ipc_msg_t ipc_iov_t errstat char int char

/* ** First create

pid; req_msg; req_iov[4]; err; *str[3],*stra,str4[300]; i,n; path[200];

port protection capability. We are not allowed

** to use the NULL capability!

*/

rnd_getrandom((char *)&servercap,sizeof(capability));

/* ** publish the cap in the file system */

sprintf(path,"/tmp/%s",PNAME);

name_delete(path); err = name_append(path,&servercap);

if(err!=STD_OK) { printf("ipc_server: can't append port prot cap %s: %s\n", path,err_why(err)); exit(1); }

/* ** The request and reply buffers. */

str[0]=(char *)malloc(100); str[1]=(char *)malloc(100); str[2]=(char *)malloc(100);

if(str[0]==NULL || str[1]==NULL || str[2]==NULL) { printf("ipc_server: can't allocate buffers\n"); exit(1); }

/* ** Register the IPC port */

err = ipc_register(&pid,PNAME,sizeof(PNAME),&servercap);

- 12 -

if(err!=STD_OK)

{

printf("ipc_server: can't register IPC port: %s\n", err_why(err)); exit(0);

}

/*

** The server loop */

for(n=0;n<100;n++) {

/* ** Setup the iovec fields for the request part */

*str[0]='\0'; *str[1]='\0'; *str[2]='\0';

IPC_SETIOV(req_iov[0],str[0],100,IPC_COPY);IPC_SETIOV(req_iov[1],str[1],100,IPC_COPY);IPC_SETIOV(req_iov[2],str[2],100,IPC_COPY);IPC_SETIOV(req_iov[3],str4,300,IPC_COPY);

stra=(char *)malloc(300); if(stra==NULL) { printf("ipc_server: can't allocate stra buffer\n"); exit(1); }

/* ** Setup the request message structure */

req_msg.portid=pid;

/* ** Wait for client requests */

err = ipc_recv(&req_msg,req_iov,4);

if(err!=STD_OK) { printf("ipc_server: ipc_getreq failed: %s\n", err_why(err)); exit(1); }

/* ** The right command ? */

if(req_msg.command!= PCMD) { printf("ipc_server: invalid command (%d)\n", req_msg.command); exit(1); }

/*

- 13 -

** Concatenate all strings, loop through the iovec list: */

*stra='\0'; for(i=0;i<3;i++)

	`{`
			`if(IPC_IOVLEN(req_iov[i])==0) { printf("ipc_server: request iovec field [%d] has zero`
	`length\n",`
			`i); exit(1); } if(IPC_IOVLEN(req_iov[i])>0 &&` `IPC_IOVLEN(req_iov[i])<100) strcat(stra,str[i]);`
	`}`

if(strlen(stra)!=req_msg.size) { printf("ipc_server: wrong string length sum: %d , expected

%d\n",

strlen(stra),req_msg.size); exit(1); }

if(strlen(str4)!=req_msg.size) { printf("ipc_server: wrong str4 length: %d , expected %d\n", strlen(str4),req_msg.size); exit(1); } if(strcmp(str4,stra)!=0) { printf("ipc_server: cat of sum strings differs from

stra\n");

}

exit(1);

free(stra);

}

Client code

#include "myserver.h"

void client() { portid_t pid;

pid;

ipc_msg_t req_msg; ipc_iov_t req_iov[4]; capability testcap; capability servercap;

errstat err; char path[256]; char str1[100],*str2,*str3,*stra,strmya[300]; int i,n;

/* ** The request and reply buffers, the first from the stack

reply buffers, the first from the stack (str1),

- 14 -

** the rest from the heap */

str2=(char *)malloc(100); str3=(char *)malloc(100);

if(str2==NULL || str3==NULL) {

		`printf("ipc_client: can't allocate buffers\n"); exit(1);`
`} /*`

** Lookup the prot cap in the file system */

sprintf(path,"/tmp/%s",PNAME);

err = name_lookup(path,&servercap);

if(err!=STD_OK)

{

printf("ipc_client: can't lookup port prot cap %s: %s\n", path,err_why(err)); exit(1);

}

/*

** Lookup the IPC port */

err = ipc_lookup(&pid,PNAME,sizeof(PNAME),&servercap);

if(err!=STD_OK) {

printf("ipc_client: can't lookup IPC port: %s\n", err_why(err)); exit(0);

}

/*

** The client loop */

for(n=0;n<100;n++) {

/* ** Setup the iovec fields for the request part */

sprintf(str1,"This is part %d.",n*n*n); sprintf(str2,"One more sentence inpart %d.",n*n*n); sprintf(str3,"And ->%x<-the last one.",n*n*n);

stra=(char *)malloc(300); if(stra==NULL) { printf("ipc_client:can't allocate stra buffer\n"); exit(1); }

sprintf(stra,"%s%s%s",str1,str2,str3);

IPC_SETIOV(req_iov[0],str1,strlen(str1)+1,IPC_COPY);IPC_SETIOV(req_iov[1],str2,strlen(str2)+1,IPC_COPY);IPC_SETIOV(req_iov[2],str3,strlen(str3)+1,IPC_COPY);IPC_SETIOV(req_iov[3],stra,strlen(stra)+1,IPC_COPY);

/*

- 15 -

** Setup the request message structure */

req_msg.portid=pid;req_msg.command=PCMD;req_msg.size=strlen(stra);

/* ** Send the request */

err = ipc_send(&req_msg,req_iov,4);

if(err!=STD_OK) { printf("ipc_client: ipc_send failed: %s\n", err_why(err)); exit(1); }

free(stra);

}

REQUEST-REPLY IPC (IPC_SHARED)

Server code

myserver.h:

#define PNAME "Myserver" #define CMD_CAP 100 #define CMD_BUF 101 #define SEG_SIZE 100000

#include "myserver.h"

void server() { portid_t pid; ipc_msg_t req_msg,rep_msg; ipc_iov_t req_iov[2],rep_iov[1]; errstat err; char str[300]; int i,n,rep_len; char path[200]; capability segcap; char *segbuf; char *bufa;

/* ** First create port protection capability. We are not allowed ** to use the NULL capability! */

rnd_getrandom((char *)&servercap,sizeof(capability));

/*

- 16 -

** publish the cap in the file system */

sprintf(path,"/tmp/%s",PNAME);

name_delete(path); err = name_append(path,&servercap);

if(err!=STD_OK) {

printf("ipc_server: can't append port prot cap %s: %s\n", path,err_why(err)); exit(1);

}

/*

** Register the IPC port */

err = ipc_register(&pid,PNAME,sizeof(PNAME),&servercap);

if(err!=STD_OK) {

printf("ipc_server: can't register IPC port: %s\n", err_why(err)); exit(0);

}

/*

** Create shareable data segment */

err = ipc_create_shared(&segcap,(long *)&segbuf,SEG_SIZE);

if(err != STD_OK) {

printf("ipc_server: can't create shared segment: %s\n", err_why(err)); exit(1);

}

/*

** The server loop */

for(n=0;n<11;n++) {

rep_len = 0;

/* ** Setup the iovec fields for the request part */

bufa=(char *)malloc(300); if(bufa==NULL) { printf("ipc_server #5: can't allocate bufa buffer\n"); exit(1); }

IPC_SETIOV(req_iov[0], bufa,300,IPC_COPY);

IPC_SETIOV(req_iov[1],

- 17 -

/*

0, 0, IPC_SHARED);

** Setup the request message structure */

req_msg.portid=pid;

/* ** Wait for client requests */

err = ipc_getreq(&req_msg,req_iov,2);

if(err!=STD_OK) {

printf("ipc_server #5: ipc_getreq failed: %s\n", err_why(err)); exit(1);

}

/*

** The right command ? */

if(req_msg.command == CMD_CAP) {

}

IPC_SETIOV(rep_iov[0], &segcap, sizeof(capability),IPC_COPY);

rep_len=1;rep_msg.size=SEG_SIZE;

else if (req_msg.command == CMD_BUF)

{

/* ** req_iov[1] now contains the buffer address within ** the shared data segment */

if(IPC_IOVLEN(req_iov[1])!=req_msg.size) { printf("ipc_server: IPC_SHARED mismatched len: %d, expected %d\n", IPC_IOVLEN(req_iov[1]),req_msg.size); exit(1); }

if(strcmp((char *)IPC_IOVBUF(req_iov[1]),bufa)!=0) { printf("ipc_server: shared data content mismatch\n"); exit(1); }

/* send the received shared data region back again */

IPC_SETIOV(rep_iov[0], IPC_IOVBUF(req_iov[1]), strlen(bufa)+1,IPC_SHARED);

rep_msg.size=strlen(bufa)+1;

rep_len=1; } else {

- 18 -

}

printf("ipc_server: invalid command (%d)\n", req_msg.command); exit(1);

rep_msg.status=STD_OK;

/* ** Send the reply */

err = ipc_putrep(&rep_msg,rep_iov,rep_len);

if(err!=STD_OK) { printf("ipc_server: ipc_putrep failed: %s\n", err_why(err)); exit(1); }

free(bufa);

}

err

= ipc_unregister(&pid);

			`if(err != STD_OK) { printf("ipc_server: ipc_unregister failed: %s\n", err_why(err)); exit(1); }`
	`}`

Client code

#include "myserver.h"

void client() { portid_t pid; ipc_msg_t req_msg,rep_msg; ipc_iov_t req_iov[2],rep_iov[1]; errstat err; char path[200],stra[300]; capability segcap; capability servercap;

char *segbuf; int i,n; long bufsize; char *str,*dummy;

dummy=(char *)malloc(1000000);

/*

- 19 -

** Lookup the prot cap in the file system */

sprintf(path,"/tmp/%s",PNAME);

err = name_lookup(path,&servercap);

if(err!=STD_OK)

{

printf("ipc_client: can't lookup port prot cap %s: %s\n", path,err_why(err)); exit(1);

}

/*

** Lookup the IPC port */

err = ipc_lookup(&pid,PNAME,sizeof(PNAME),&servercap);

if(err!=STD_OK) {

printf("ipc_client: can't lookup IPC port: %s\n", err_why(err)); exit(0);

}

/*

** Setup reply iovec (cap) */

IPC_SETIOV(rep_iov[0],&segcap,sizeof(capability),IPC_COPY);

/* ** Get the shared seg cap from the server */

req_msg.portid=pid;req_msg.command=CMD_CAP;

err = ipc_trans(&req_msg,NULL,0,&rep_msg,rep_iov,1);

if(err!=STD_OK) {

}

printf("ipc_client: 1. ipc_trans failed: %s\n", err_why(err)); exit(1);

if(rep_msg.status!=STD_OK)

`{`
		`printf("ipc_client: 1. ipc_trans: server reply: %s\n", err_why(err)); exit(1);`
`}`

if(IPC_IOVLEN(rep_iov[0])!=sizeof(capability))

`{`
		`printf("ipc_client: server returned invalid cap\n"); exit(1);`
`}`

bufsize=rep_msg.size;

/* ** Map in the shared segment */

- 20 -

err = ipc_map_shared(&segcap,(long *)&segbuf,bufsize);

if(err!=STD_OK)

{

printf("ipc_cleint: can't map shared data segment: %s\n", err_why(err)); exit(1);

}

/*

** The client loop */

for(n=0;n<10;n++) {

/* offset within the shared data segment */

str=&segbuf[(n+1)*10];

sprintf(str,"Hallo %d %x \n",n*n*n,rand());

IPC_SETIOV(req_iov[0],str,strlen(str)+1,IPC_COPY);IPC_SETIOV(req_iov[1],str,strlen(str)+1,IPC_SHARED);IPC_SETIOV(rep_iov[0],0,0,IPC_SHARED);

/* ** Setup the request message structure */

req_msg.portid=pid;req_msg.command=CMD_BUF;req_msg.size=strlen(str)+1;

/* ** Send the request */

err = ipc_trans(&req_msg,req_iov,2, &rep_msg,rep_iov,1);

if(err!=STD_OK) { printf("ipc_client: ipc_trans failed: %s\n", err_why(err)); exit(1); }

if(rep_msg.status!=STD_OK) { printf("ipc_client: server status: %s\n", err_why(err)); exit(1); }

/* ** Extract the reply */

if(IPC_IOVLEN(rep_iov[0])!=strlen(str)+1) {

- 21 -

printf("ipc_client #1: invalid reply [1] len: %d, expected

%d\n",

}

IPC_IOVLEN(rep_iov[0]), strlen(str)+1); exit(1);

if(strcmp(str,(char *)IPC_IOVBUF(rep_iov[0]))!=0) { printf("ipc_client: reply and original string not equal:\n%s\n%s\n", str,stra); exit(1); }

}

free(dummy);

}

Name

1. Kernel I/O port and management routines.

2. User process I/O port and mapping routines

Machine dependency: i386, ISA, PCI, VLB

Synopsis

#include <i386_proto.h>
#include <ioport.h>
#include <sys/iomap.h>

void out_byte(int _port, int _val); void out_word(int _port, int _val); void out_long(int _port, long _val);

int in_byte(int _port); int in_word(int _port); long in_long (int _port);

void outs_byte(int _port, char * _ptr, int _bytecnt); void outs_word(int _port, char * _ptr, int _bytecnt); void outs_word_l(int _port, char * _ptr, int _wordcnt); void outs_long_l(int _port, char * _ptr, int _longcnt); void ins_byte(int _port, char * _ptr, int _bytecnt); void ins_word(int _port, char * _ptr, int _bytecnt); void ins_word_l(int _port, char * _ptr, int _wordcnt); void ins_long_l(int _port, char * _ptr, int _longcnt);

Kernel:

void request_region(unsigned int from, unsigned int extent, const char *name);

int check_region(unsigned int from, unsigned int extent);

int release_region(unsigned int from, unsigned int extent);

User process:

errstat io_check_region(unsigned int start, unsigned int size, capability *syscap);

errstat io_map_region(unsigned int start, unsigned int size, char *devname, capability *syscap);

errstat io_unmap_region(unsigned int start, unsigned int size,

- 23 -

capability *syscap);

errstat io_vtop

(long vaddr, long vlen, long *paddr capability *syscap);

Description

Low level input and output routines for hardware port access. Different types are supported. See
generic i386 processor and assembler documents for more details.

Programming Interface

I/O port access

GCC: all I/O port instructions are compiled inline.

Write a byte, word or long value _val to a hardware I/O port with address _port:

void out_byte(int _port, int _val); void out_word(int _port, int _val); void out_long(int _port, long _val);

Read a byte, word or long value _val from a hardware I/O port from address _port:

int in_byte(int _port); int in_word(int _port); long in_long(int _port);

There are some additional routines like the above, but with different I/O timing (pausing/short delay ) for old buggy buses and interface cards:

out_##_p()in_##_p()

Output a string (memory area) of bytes, words, or longs, starting at address _ptr, of length _bytecnt (in bytes units), of length _wordcnt (in word units) , or of length _longcnt (in long units) to the I/O port with address _port

void outs_byte(int _port, char * _ptr, int _bytecnt) void outs_word(int _port, char * _ptr, int _bytecnt) void outs_word_l(int _port, char * _ptr, int _wordcnt) void outs_long_l(int _port, char * _ptr, int _longcnt)

Read a memory area (bytes,words or longs) to the start address _ptr from the I/O port from address _port, in byte, word or long counts:

void ins_byte(int _port, char * _ptr, int _bytecnt) void ins_word(int _port, char * _ptr, int _bytecnt) void ins_word_l(int _port, char * _ptr, int _wordcnt) void ins_long_l(int _port, char * _ptr, int _longcnt)

I/O port management

Kernel

- 24 -

To avoid overlaps of used I/O ports allocate a region in I/O space for a device driver with the request_region method:

void request_region(unsigned int from,

unsigned int extent, const char *name)

from determine the start adress, extent determine the size of the region and name is the (short) driver name.

Call check_region() before probing for your hardware.Once you have found your hardware, register it with request_region(). The return value is < 0 if the port region is already allocated, else >= 0.

int check_region(unsigned int from, unsigned int extent)

If you unload/close the driver (?), or you don't need the port
release_region to free the port address range (usually not necessary).

int release_region(unsigned int from, unsigned int extent)

access anymore,

use

User process

#include <sys/iomap.h>

A user process can use I/O port access functions, if either I/O ports are mapped in the process, or the I/O privilege level IOPL for the process is set to level 0.

To probe a port region first ,there is a library stub function

errstatio_check_region(start, size, syscap) unsigned int start;

unsigned int size; capability *syscap;

This function returns either STD_EXISTS for the case, the I/O port region in the range [start ... start+size] is already used by another module, or STD_OK, if the port region is ununsed. The syscap capability must be the kernel host (root dir) capability, or the request will be denied.

To map in the port region, use

errstatio_map_region(start,size,devname,syscap) unsigned int start; unsigned int size;

char *devname; capability *syscap;

In addition to io_check_region, a device name string must be supplied. To unmap a port region, use

errstat

- 25 -

io_unmap_region(start,size,syscap) unsigned int start; unsigned int size; capability *syscap;

A process can get full I/O access with changing the IOPL:

errstatio_setpvl(pvl,syscap) int pvl; capability *syscap;

Note: If the GNU gcc compiler is used, don't forget to enable optimization with the -O option, or you're not able to compile I/O port instructions inline !

To get the physical address from a user process virtual address, use:

errstatio_vtop(vaddr,vsize,paddr,syscap) long vaddr; long vsize; long *paddr; capability *syscap;

This funcion needs teh virtual address vaddr and the length of the memory region vsize, and returns the physical translated address paddr. The memory region can be a mapped hardware segment, too.

Aliases

For backward compatibility.

#define outb(a,b)

out_byte(b,a)

#define outw(a,b) #define outl(a,b) #define outsb(a,b,c) #define outsw(a,b,c) #define outsl(a,b,c) #define inb(a) #define inw(a) #define inl(a) #define insb(a,b,c) #define insw(a,b,c) #define insl(a,b,c) #define outb_p(a,b) #define outw_p(a,b) #define outl_p(a,b)

out_word(b,a) out_long(b,a) outs_byte(a,b,c) outs_word_l(a,b,c) outs_long_l(a,b,c) in_byte(a) in_word(a) in_long(a) ins_byte(a,b,c) ins_word_l(a,b,c) ins_long_l(a,b,c) out_byte_p(b,a) out_word_p(b,a) out_long_p(b,a)

Warnings

Things that might unpleasantly surprise the user.

Examples

- 26 -

User Process I/O

#include <sys/iomap.h>

#define KERNELSYSCAP "/super/hosts/mymach" #define MYNAME "MYSERVER"

int main() { errstat err; capability syscap;

/* lookup the system capability */ err=name_lookup(KERNELSYSCAP,&syscap);

err=io_checkregion(0x278,4,&syscap);

if(err!=STD_OK) { failwith ("IO region already used"); }

err = io_map_region(0x278,4,MYNAME,&syscap);

out_byte(0x278,0ff);

...

io_unmap_region(0x278,4,&syscap); return 0; }

Name

ISR - user device interrupt service routine handler management

Synopsis

#include <sys/isr.h>

typedef struct isr_handler isr_handler_t,*isr_handler_p;

errstat interrupt_register(isr_handler_p isr,capability *syscap); errstat interrupt_unregister(isr_handler_p isr,capability *syscap);

errstat interrupt_await(int irq, event ev); errstat interrupr_done(int irq, int status);

#define ISR_SERVER_PORT"sys::isr-server"

Description

In addition to the new Interprocess communication module IPC, the ISR module allows (privileged) user processes to service hardware device interrupts. To get this service, the process must send an ipc message via ipc_trans to the system isr server registered with the portname sys::isr-server. The message request data holds the user isr handler structure with the desired isr settings. On the ipc_trans reply, the isr handler will be returned with some additional settings, like the isr id number. There are some library stub functions for this purpose.

After the ISR was successfull registered, the user ISR must call the interrupt_await function. If the hardware device triggers this irq event, the kernel ISR scheduler will wakeup the ISR handler. The isr handler can now service his irq. When finished, the isr handler must call the interrupt_done function to acknowledge the irq. The irq kept locked untill this functions is called. Keep this in mind. Interrupt sharing is possible. If several ISR handler, perhaps in different processes share one irq line, they will all be waked up on the irq event. The ISR server port is protected currently with the kernel root/host capability. The process can lookup this host capability in the /super/hosts directory.

Programming Interface

To register a user isr handler, use the interrupt_register library stub function:

errstatinterrupt_register(isr,syscap) isr_handler_p isr; capability *syscap;

The following fields in the isr handler must be set:

isr.irq = <myirq number>;

- 28 -

isr.flags = IRQ_NORMAL or IRQ_SHARED; strcpy(isr.devname,"mydev name");

The system capability is the kernel host capability (see above).
To unregister the isr handler, use instead the interrupt_unregister function:

errstatinterrupt_unregister(isr,syscap) isr_handler_p isr; capability *syscap;

The isr handler struct must be the same as returned by interrupt_register.
Now, the isr handler can wait for irq events using with the kernel system call:

errstatinterrupt_await(irq,ev) int irq; event ev;

Irq is the registered irq number, and ev is the event id number to wait for, returned by the register transaction in the isr.ev field.

To acknowledge an interrupt, use

errstatinterrupt_done(irq,status) int irq; int status;

For example:

for(;;) { err = interrupt_await(isr.irq,isr.ev);

... service interrupt ...

status = [IRQ_SERVICED] or [IRQ_UNKNOWN];

err = interrupt_done(isr.irq,status);

}

Note: The interrupt service thread itselfes must register the ISR, and not any other thread of the process!

Administration

If a thread, previously registered an isr handler, exits, the ISR handler will be automatically
removed and unregistered.

Warnings

Interrupt handler, regardless in kernel or user space, can corrupt the system. You process needs full I/O privileges, and can therefore corrupt your hardware. Furthermore, wrong arguments or usage of the interrupt_await/done functions can cause trouble concerning the irq handling.

- 29 -

Examples

capabiliyt syscap; #define HOSTCAP = /super/hosts/myhost

void myisr() { errstat err; int stat;

isr_handler_tisr;

name_lookup(HOST_PATH,&syscap);

isr.irq=3; isr.flags=IRQ_NORMAL; strcpy(isr.devname,"Serial Adpater 1");

err = interrupt_register(&isr,&syscap);

if(err!=STD_OK) failed

for(;;) { stat = interrupt_await(isr.irq,isr.ev);

.. service the irq ..

stat = interrupt_done(isr.irq,IRQ_SERVICED); }

}

Name

kthread - kernel thread creation and thread memory management module

Synopsis

#include <sys/kthread.h>

int thread_newthread(func, stsize, param); int thread_exit();

void thread_switch();

#ifdef KERNEL_GLOCALS char *thread_alloc(index, size); #endif

int thread_await(eventnum, timeoutval); void thread_wakeup(eventnum);

Description

The thread module provides the programming interface to create, destroy and manage concurrent threads. Each thread can start executing a separate routine; they do not all have to execute the same function. Each thread in a multi-threaded process shares the same address space. It has its own stack and program counter but otherwise shares the text, data and bss segments of the process. Because it is sometimes useful to have data global within a thread but not accessible outside the thread, glocal data is provided. See the description of thread_alloc below for details of how to allocate and use glocal data.

The threads are currently scheduled non-preemptively by default. Preemptive scheduling must be enabled explicitly using thread_enable_preemption(see thread_scheduling(L)). It is important to protect accesses to global data with mutexes (see mutex(L)). It is possible for a thread to request that it be rescheduled using threadswitch (see thread_scheduling(L)). This can be very useful in the presence of non-preemptive scheduling.

NB. If a program is multi-threaded it is not safe to use UNIXemulation routines in more than one thread of the program. UNIXis not multi-threaded and therefore the emulation is only likely to be correct if confined to a single thread of a program. For example, a program with several threads where one is in read waiting for input from a terminal and another does a fork may well hang until the read is satisfied. The exit routine has been modified to force a close on all descriptors, even if they are held by another thread and no guarantees are made on the correctness of any resulting input or output if exit is called in such circumstances.

- 31 -

Functions

thread_newthread

intthread_newthread(func, stsize, param) void (*func)();

int stsize; long param;

Thread_newthread spawns a new thread and starts it at function func. Thread_newthread allocates a thread stack of stsize bytes. Stsize must be at least 512 bytes or the calling program will be killed. Parameters can be passed to the new thread via the thread_newthread parameters param and psize. Param is a pointer to the data structure to pass, psize is the size of the data structure. Param must be allocated by a member of the malloc family (see malloc(L)) since the clean up when the thread exits will free this memory. Memory allocated using thread_alloc cannot be used!

When no parameters are passed, param must be a NULL-pointer and psize must be zero. Once the thread exits, the allocated stack and parameter area are freed.

The function func is called as follows:

void (*func)(param) long param;

If the called function returns, the thread exits.

Thread_newthread returns zero upon failures (insufficient memory or out of threads), otherwise the thread id number of the new created dthread is returned.

Note that not all threads are created equal. When a process first starts it consists of one thread which starts the routine main(). If main returns then exit() is called which will terminate the entire process immediately. If it is desired that main terminate and the other threads continue then it must not return but call thread_exit (described below).
Typically, when a new thread is created the parent continues to execute until it blocks. However, if preemptive scheduling is enabled (see thread_scheduling(L)) then the newly created process will have the same priority as the current thread. This means at the next event (such as an interrupt) the new thread may be scheduled.

thread_exit

intthread_exit()

- 32 -

Thread_exit stops the current thread. It then frees any glocal memory (allocated by thread_alloc/thread_realloc), the parameter area and the allocated stack before exiting. Thread_exit does not return. When the calling thread was a server thread, and it was still serving a client, the client will receive an RPC_FAILURE (see rpc(L)). If thread_exit is called in the last thread of a process, the process exits as well (see exitprocess(L)).

thread_switch

voidthread_switch()

Thread_switch calls the scheduler to perform a switch to another ready to run thread in the kernel, or if no one is runnable, a user thread.

thread_alloc

char *thread_alloc(index, size) int *index;

int size;

The first time thread_alloc is called (with *index == 0), thread_alloc allocates glocal data of size bytes and returns the module reference number in *index. The allocated data is initialized to zero. The value of the function is a pointer to the glocal data. Successive calls to thread_alloc with the previously assigned module reference number result in returning a pointer to the previously allocated memory.

Thread_alloc returns a NULL-pointer on insufficient memory or when a successive call to thread_alloc has a different size parameter than in the original call. In this case, the already allocated memory is not modified or freed.

Consider an example. Suppose a function in a single-threaded program that uses a (static) global variable. For example,

static long sum; long add(x) long x;

	`{`
			`sum += x; return sum;`
	`}`

Now suppose this function must be used in a multi-threaded program, where the threads perform independent computations. This means a separate sum variable is required for each thread. If the number of threads is known in advance, the threads could be numbered and an array indexed by thread numbers could maintained (assuming a thread can find out its own thread number, which is not trivial unless a parameter is added to add for this purpose). In general, however, this is too complex. A simpler solution is to use thread_alloc.

#include "thread.h"

static int ident;

- 33 -

long add(x) long x;

	`{`
			`long p_sum; p_sum = (long ) thread_alloc(&ident, sizeof(long)); p_sum += x; return p_sum;`
	`};`

Because there may be several functions in a program that need a block of glocal memory for private use, thread_alloc has an ident parameter that indicates the identity of the memory block (not the calling thread!). This must be the address of a global integer variable, statically initialized to zero.** This variable is used for thread_alloc's internal administration; its contents must never be touched by the calling function. Functions that need to use the same block of glocal data must use the same ident variable. For consistency, all calls using the same ident variable must pass the same block size. To change the block size, thread_realloc can be used.

thread_await

intthread_await(eventnum,timeoutval)

event eventnum; interval timeoutval;

Thread_await is used concurently with the thread_wakeup routine. A thread calling thread_await wants to wait for an event denoted by a unique event number eventnum, for example an address of any global variable of a process. Either another thread triggers the event eventnum with thread_wakeup or a timeout occurs, and thread_await returns. On wakeup, thread_await returns 0, else -1 (timeout, interrupted).
A zero timeout interval value indicates no timeout.

thread_wakeup

voidthread_wakeup(eventnum) event eventnum;

Thread_wakeup wakeups all threads waiting for event eventnum.

Example:

event myevent;

int mythread_A() { int res;

/* ** Set timeout to zero: no timeout. Only user signals ** will be caught. */

res=thread_await((event) &myevent, (interval) 0);

if(res<0) printf("we're interrupted\n');

...

}

int mythread_B() { ...

- 34 -

thread_wakeup((event) &myevent); ...

}

Warnings

The thread module uses the global variable _thread_local (see sys_newthread(L)) for its administration.

Example

The following example shows thread creation and the use of glocal memory. Ref_nr is the module reference number. Each time the signal handler is activated, the glocal data is fetched which belongs to the running thread and this module.

#define NTHREADS 5 #define STKSIZE 8096 #define SIZE 100 static int ref_nr;

void worker_thread(param) long param; { char *ptr; int i;

i = *(int *) param; printf("Thread %d started\n", i);

/* Initially allocate memory and fetch module ref. number */ if ((ptr = thread_alloc(&ref_nr, SIZE)) == 0) { fprintf(stderr, "worker_thread: cannot thread_alloc.\n"); thread_exit(); } strcpy(ptr, "Peter was here"); ... }

void module_init() { int i; char * p; for (i = 0; i < NTHREADS; i++) { if ((p = malloc(sizeof (int))) == 0) { printf("malloc failed\n"); exit(1);

- 35 -

}

*(int *)p = i; if (!thread_newthread(worker_thread, STKSZ, p)) { printf("thread_newthread failed\n"); exit(1); }

/* Do not wait for threads to exit */

	`thread_exit();`
`}`

Machine dependent kernel functions- i386 architecture

* I/O Port routines
* Interrupt Handling
* Timers and Time
* Memory management code
* DMA operation
* PCI management

I/O Port access routines

Header-Files:

#include <i386_proto.h>
#include <ioport.h>

Output byte, word or long data to a hardware port:

void out_byte(int _port, int _val) void out_word(int _port, int _val) void out_long(int _port, long _val)

Input byte, word or long data from a hardware port:

int in_byte(int _port) int in_word(int _port) long in_long(int _port)

Mofied versions with small time delay (for noisy (ISA) buses):

out_##_p() with different I/O timingin_##_p() with different I/O timing

Block transfer functions (string)

Output byte, word or long strings in byte count or in word/long counts (_l):

void outs_byte(int _port, char * _ptr, int _bytecnt) -> count in bytes !!! void outs_word(int _port, char * _ptr, int _bytecnt) -> count in bytes !!! void outs_word_l(int _port, char * _ptr, int _wordcnt) -> count in words !!!

void outs_long_l

- 37 -

(int _port, char * _ptr, int _longcnt)

-> count in longs !!!

And the byte, word or long input string versions:

void ins_byte(int _port, char * _ptr, int _bytecnt) -> counts in bytes !!! void ins_word(int _port, char * _ptr, int _bytecnt) -> counts in bytes !!! void ins_word_l(int _port, char * _ptr, int _wordcnt) -> count in words !!! void ins_long_l(int _port, char * _ptr, int _longcnt) -> count in longs !!!

Aliases (Linux prototypes):

#define outb(a,b) out_byte(b,a) #define outw(a,b) out_word(b,a) #define outl(a,b) out_long(b,a) #define outsb(a,b,c) outs_byte(a,b,c) #define outsw(a,b,c) outs_word_l(a,b,c) #define outsl(a,b,c) outs_long_l(a,b,c) #define inb(a) in_byte(a) #define inw(a) in_word(a) #define inl(a) in_long(a) #define insb(a,b,c) ins_byte(a,b,c) #define insw(a,b,c) ins_word_l(a,b,c) #define insl(a,b,c) ins_long_l(a,b,c) #define outb_p(a,b) out_byte_p(b,a) #define outw_p(a,b) out_word_p(b,a) #define outl_p(a,b) out_long_p(b,a)

If you use the GNU gcc compiler, the I/O port functions expands to asm inline macros, else for the case
of the obsolete ACK cc compiler, calls to external subroutines are performed.

I/O port management

To avoid overlaps of used I/O ports allocate a region in I/O space for a device driver with the request_region method:

void request_region(unsigned int from, unsigned int extent, const char *name)

from determine the start adress, extent determine the size of the region and name is the (short) driver name.

int check_region(unsigned int from, unsigned int extent)

- 38 -

If you unload the driver, use release_region to free ports (usually not necessary).

int release_region(unsigned int from, unsigned int extent)

Interrupt handling

Header-Files:

#include <i386_proto.h>
#include <irq.h>

General informations for the case a hardware device triggers an interrupt:

* Hardware triggers IRQ ##

* ISR (Interrupt service routine) handler will be called with "Interrupts disabled" and "IRQ ##
blocked/masked"

* Never call functions within the ISR which are used by other kernel parts (most functions are
not interrupt save - e.g. all rountines with list management - malloc() ...)

* If you need more kernel support, push a high level interrupt handler with enqueue(). After the
low level ISR is finished, this routine will be called as soon as possible.

* It's allowed to enable interrupts again within ISR (but pay attention):

save_flags(flags);sti(); ... restore_flags(flags);

Usefull functions:

Disable all hardware interrupts for a limited time:

void disable(void) (obsolete) void cli(void)

Enable hardware interrupts again:

void enable(void) (obsolete) void sti(void)

Get the processor flag register:

int get_flags(void) void save_flags(int _flags) /* macro */

And restore the processor flag register again:

- 39 -

void set_flags(int _flags) void restore_flags(int_flags)

/* macro */

Disabling and enabling of interrupts in critical code fragments should be done this way:

int flags; ... get_flags(flags); cli(); { /* CRITICAL CODE REGION */ } restore_flags(flags); ...

Enabling and disabling of a specific interrupt level:

void enable_irq(unsigned int irq_nr) -> enable (unmask) single irq void disable_irq(unsigned int irq_nr) -> disable (mask) single irq

Interrupt ISR management

The kernel keeps a registry of interrupt lines, similar to the registry of I/O ports. If a driver want to install
his ISR function for a particular interrupt level, it must use the request_irq method:

int request_irq(unsigned int irq_nr, void (*handler)(int, void *,struct pt_regs *), unsigned long irqflags, const char * devname, void *dev_id)

With the argument list:

irq_nr:
the interrupt number {0..15}

handler:
The pointer to the handling function being installed

Supported flags:

NULL, SA_NORMAL: Normal Interrupt
SA_INTERRUPT, SA_FAST: Fast Interrupt
SA_SHIRQ: Shared Interrupt

devname:
string for information purposes(usually the (short) driver name)

dev_id:
pointer is used for shared interrupt lines (commonly NULL)

- 40 -

The return value is 0 for success, else negative for request failure (error code). It's not uncommon for the function to return -EBUSY to signal that another driver is already using the requested interrupt line. But interrupt lines can be shared with other devices, given the SA_SHIRQ flag. Only device drivers can share an interrupt line if they all call request_irq with the SA_SHIRQ flag!

Remove the irq handler from the irq registry queue with free_irq(). Additional, if there is no other shared interrupt handler on this irq this function disables the interrupt.

void free_irq(unsigned int irq, void *dev_id)

IRQ auto detection

The kernel offers a low-level facility for probing the interrupt number of a specific device. The facility consists of two functions:

unsigned long probe_irq_on(void)

This function returns a bitmask of unassigned interrupts. The driver must preserve the returned bitmask and pass it to probe_irq_off() later. After this call, the driver should arrange for its device to generate at least one interrupt.

int probe_irq_off(unsigned long)

After the device has requested an interrupt, the driver calls this function, passing as argument the bitmask previously returned by probe_irq_on(). probe_irq_off() returns the number of the interrupt that was issued after "probe_on". If no interrupts occurs, 0 is returned.

Timers and Time

Header-Files:
#include <i386_proto.h>

Install a new timer handler with sweeper_set():

void sweeper_set( void (*sweep)(), long arg, interval period, int once)The argument list:

sweep
Timer handler function

- 41 -

arg
		Argument given to timer handler
period
		Timer period in milli seconds
once
		If set, call timer handler only one time. If zero, the timer handler is called each period.

Additional Linux implementation using sweeper_set():

Initialize timer struct (not really needed) with init_timer(), add a new timer (always only one time called) with add_timer(), and remove the timer with del_timer() (not really needed):

void init_timer(struct timer_list *timer) void add_timer(struct timer_list *timer) int del_timer(struct timer_list *timer) And the timer data structure:

Header file:

#include <linux/timer.h>

struct timer_list{ struct timer_list *next; /*dummy*/ struct timer_list *prev; /*dummy*/ unsigned long expires; /* timeout value in jiffies: always relative to teh actual time! */ unsigned long data; /* argument to the handler */ void (*function)(unsigned long); /* handler for the timeout */ }

To get the actual time in milli seconds (resolution 1 ms) from the hardware timer use:

unsigned long getmilli()

There is another, global time variable:

unsigned long milli_uptime;

It delivers the actual time in milli seconds, but with time resolution PIT_INTERVAL (pit.h), usually 10 ms. It's possible, that this time value is not updated quickly enough for acurate time measurement (delays up to 10ms).

- 42 -

Memory management code

Hardware devices:

Header files:
#include <i386_proto.h>

Map in the kernel page table the hardware device pages from on-board memory in the range between 640k and 1024K from address `addr' (physical adress) with total size `size' (bytes):

void page_mapin(phys_bytes addr, phys_bytes size)

Map (mainly PCI) hardware device pages (configuration space, on board memory...) from physical adress 'addr' with total size 'size' to the virtual adress 'virt' with

vir_bytes ioremap(addr,size) phys_bytes addr; phys_bytes size;

DMA operation

Header files:
#include <i386_proto.h>
#include <dma/dma.h>

Set up a DMA channel:

int dma_setup( int mode, /* DMA operation mode

mode */

int channel, /* DMA channel */ char *addr, /* buffer pointer */

*/

int count) /* number of bytes to

Mode register values:

DMA_CHANMASK channel select mask DMA_VERIFY verify transfer DMA_WRITE write transfer DMA_READ read transfer DMA_ENAINIT auto initialization enabled DMA_DISINIT auto initialization disabled DMA_ADDRINC address increment select DMA_ADDRDECR address decrement select DMA_DEMAND demand mode select DMA_SINGLE single mode select DMA_BLOCK block mode select DMA_CASCADE cascade mode select

Start a DMA request on a given channel:

void dma_start(int channel)

Stop DMA transfer by disabling DMA channel for all further DMA operations:

to move */

operations:

void dma_done(int channel)

- 43 -

PCI management

Header files:
#include <bios32.h>
#include <pci.h>

The following functions should be used by a PCI driver to look for its hardware device:

int pcibios_present(void)

int pcibios_find_device( unsigned short vendor, unsigned short id, unsigned short index, unsigned char *bus, unsigned char *function)

int pcibios_find_class( unsigned int class_code, unsigned short index, unsigned char *bus, unsigned char *function)

char *pcibios_strerror(int error)

Accessing the configuration space: After the driver has detected the device, it usually needs to read from or write to the three adress spaces: memory, port and configuration. In particular, accessing the configuration space is vital to the driver because it's the only way it can find out where the device is mapped in memory and in the I/O space.

int pcibios_read_config_byte( unsigned char bus, unsigned char function, unsigned char where, unsigned char *ptr)

int pcibios_read_config_word( unsigned char bus, unsigned char function, unsigned char where, unsigned short *ptr)

int pcibios_read_config_dword( unsigned char bus, unsigned char function,

- 44 -

unsigned char where, unsigned int *ptr)

int pcibios_write_config_byte( unsigned char bus, unsigned char function, unsigned char where, unsigned char val)

int pcibios_write_config_word( unsigned char bus, unsigned char function, unsigned char where, unsigned short val)

int pcibios_write_config_dword(

For a more detailed explanation

unsigned char bus, unsigned char function, unsigned char where, unsigned int val)

of the PCI stuff check out popular

Linux documentation.

- 45 -

Network drivers

Name

Network driver and ethernet interface

Synopsis

#include <amoeba.h>
#include <type.h>
#include <machdep.h>
#include <internet.h>
#include <global.h>
#include <byteorder.h>

#include <ethif.h>
#include <sys/flip/ethproto.h>
#include <sys/flip/ethpreamble.h>
#include <sys/flip/packet.h>
#include <sys/flip/flip.h>
#include <sys/proto.h>
#include <server/ip/hton.h>

#include "pktqueue.h"

Description and programming interface

This manual page describes the netowork driver functions and management within the Amoeba kernel.

Initialization of network drivers

From protocols/generic/ethif.c the function eth_init is called. This function initializes all ethernet devices found in the system.

eth_init scans through all entries in the struct-array

heilist [interface 0,interface 1,..]

found in the file etherconf.c placed in the main configuration path for the specific kernel (the location, where also the main Amakefile for the kernel and other configuration files reside- e.g:

conf/i80386.ack/kernel/ibm_at/workstation

heilist[] is of type hei_t, defined in ethif.h . The first entry contains the name of the interface (e.g. "lance", "NS8390" - stand for many card types using 8390 chip's, ...), the second gives the number of different cards supported by this driver, and the rest points to driver functions important for higher network layers:

- 46 -

struct hard_ethernet_interface_info

{

			`char hei_name; int hei_nif; int (hei_alloc)();`	`Allocate local device struct(s)`
	`}`		`int (hei_init)(); void (hei_send)(); int (hei_setmc); int (hei_stop)();`	`Probe and init the card Send a packet (or, if necessary queue it) Set hard multicast address Stop the interface, free recources`

typedef hard_ethernet_interface_info hei_t,*hei_p ;

eth_init calls first hei_alloc to allocate and initialize driver specific device-struct and second the probing and initializing routine hei_init. If the probing was succesfull, hei_init returns 1, otherwise 0 or a negative value.

Commonly the driver init routine

hei_init(int hardifno, int softifno, char *ifaddr, int *nrcvpkt,*nsndpkt)

allocates and inits the following buffers and pools:

* Allocate data buffers for the receive packet pool:

rdata=(char *) alloc((vir_bytes)(RCVPOOLSIZE*RCVETHPKTSIZE),0); bufs =(pkt_p) alloc((vir_bytes)(RCVPOOLSIZE*sizeof(pkt_t)),0);

* Initialize receive packet pool:

pkt_init(&rpool,RCVETHPKSIZE,bufs, RCVPOOLSIZE,data,(void (*)())0,0L);

* RX-Queue management; necessary in almost drivers:

Within the ISR the driver collects the packets from the card into preallocated packetbuffers from the receive pool. But teh card driver deliver the packets outside the ISR in an higher level function enqueued by the scheduler to the FLIP-Box.

int rxhead,rxtail; pkt_p *rxpkts; rxhead=rxtail=0; rxpkts=(pkt_p *)alloc((vir_bytes)(RQUEUESIZE*sizeof(pkt_p)) ,0);

* Fill the RX-Queue with packets from the receive pool:

for(i=0;i<RQUEUESIZE;i++) { pkt_p pktin;PKT_GET(pktin,&rpool); pktin->p_admin.pa_size=RCVETHPKTSIZE;proto_setup(pktin,eh_t); pktin->p_contents.pc_totsize= pktin->p_contents.pc_dirsize=0; rxpkts[i]=pktin; }

* TX-Queue management:

- 47 -

If the driver can't send a packet immediately delivered with hei_send, he queues it and
send it later upon a transmitter interrupt.

int txhead,txtail; pkt_p *txpkts; txhead=txtail=0;

txpkts=(pkt_p *) alloc((vir_bytes)(TQUEUESIZE*sizeof(pkt_p)),0);

Additional the init routine copies the ethernet address of the card into the char buffer ifaddr , setup irq-handling, fill the local device structure and many more.

Packet receiving and delivering

Non DMA cards receive and store a ethernet packets in their own on board receiver ring buffer. After, the card triggers the interrupt and the ISR is called. Now the driver pulls the next RX- queue pointer for the new packet:

int rnext; if((rnext=rxhead+1)==RQUEUESIZE) rnext=0; if(rnext==rxtail)

			`error(NOFREEPKTSLOT);`
	`else {`
			`pkt_p; pkt=rxpkts[rxhead]; pkt->p_contents.pc_totsize= pkt->p_contents.pc_dirsize=pkt_len; getpkt(..,pkt,pkt_len,..);`
	`}`

The local getpkt() function stores the packet to the buffer address

buf=pkt_offset(pkt);

Additional, a high level receiver routine will be enqueud. This routine delivers the collected packets in the RX-Queue to the FLIP-Box:

int rhead; pkt_p pktin,pktout; rhead=rxhead; while( rhead != rxtail ) {

pktout=rxpkts[rxtail];

/* Get a new packet from the receive pool */

PKT_GET(pktin,&rpool); if (pktin != (pkt_p) 0) { /* Init the new packet */

pktin->p_admin.pa_size=RCVETHPKTSIZE;

proto_setup_input(pktin,eh_t);

- 48 -

pktin->p_contents.pc_totsize= pktin->p_contents.pc_dirsize=0;

} else

error(OUFOFPKT);

}

/* put the new packet in the RX-Queue */

rxpkts[rxtail]=pktin; if(++rxtail == RQUEUESIZE ) rxtail=0;

/* deliver the extracted packet */

eth_arrived(softifno,pktout);

Packet sending

The higher network layer (FLIP-Box) calls the

hei_send ( int ifno , pkt_p pkt )

function. If the driver can send the packet immediately (the transmitter is not busy), it must take
care about the packet buffer structure. The packet data consists of two parts:

1. The administration data at address pkt_offset(pkt) from length pkt->p_contents.pc_dirsize

2. The actual data somewhere in the (virtual) user or kernel address space starting at address

pkt->p_contents.pc_virtual

and of length

pkt->p_contents.pc_totsize - pkt->p_contents.pc_dirsize

The local putpkt() routine must send these two areas separately. Sometimes it get in trouble with not aligned (word,long...) data. There are two possibilities:

* The packet contains only direct data, no virtual: increase the byte count to the
alignment boundary

* The packet contains both direct and virtual data: Pain. Copy the packet in a
contingous buffer and transfer this buffer. Arrggh.

If the transmitter is still busy, you must queue the packet in the TX-Queue:

int tnext, if((tnext=txhead+1)==TQUEUESIZE) tnext=0; if(tnext=txtail) error(NOFREETXSLOT); else {

txpkts[txhead]=pkt;

- 49 -

}

txhead=tnext;

If the transmitter is ready to send a new packet , he will raise an interrupt. Inside the ISR we enqueue a highlevel transmitting routine. This routine will put the next packet from the TX- Queue (FIFO-order) into the card's tx-buffer:

int tnext; pkt_p pkt; while ( txhead != txtail ) {

pkt=txpkts[txtaill]; txpkts[txtail]=(pkt_p) 0; if(++txtail == TQUEUESIZE ) txtail=0;

putpkt(pkt);

pkt_discard(pkt); }

Packet queue macros

Because currently all drivers operating in the above manner, it's obvious to use macros for the receive and send packet pools. These macros are defined in the header pktqueue.h in the net driver directory.

These macros make the following assumption about the private device driver structure dev:

struct dev { ...

char *name; pkt_p *txpkts; int txhead; int txtail; pkt_p *rxpkts; int rxhead; int rxtail; pool_t rxpool;

... }

Further the Definitions:

RXQUEUESIZE RCVPOOLSIZE/2 (??) TXQUEUESIZE RCVETHPKTSIZE (PKTBEGHDR + 1514)

The device structure must conatin these variables. The aviable macros are:

Check for packets in the receive or send queue:

CHECKTXQUEUE(dev) CHECKRXQUEUE(dev)

If a packet can't be send immeadiately (transmitter busy), put it in the TX queue:

PUTTXQUEUE(dev,pkt)

- 50 -

A packet was received. Get the next pkt pointer from the RXQUEUE. Within ISR the driver can't allocate a packet struct from the RCVPOOL. So it uses a preallocated packet from the RXQUEUE.
Later, in an enqueued high level routine it delivers the packet(s) with eth_arrived() to the FLIP layer.

PUTRXQUEUE(dev,pkt)

Get a packet from the TX queue:

GETTXQUEUE(dev,pkt)

Get a packet from the RX queue and allocate a noew one from the receive pool:

GETRXQUEUE(dev,pkt)

Examples

Take a look in the source code of various network drivers in the
machdep/dev/ibm_at/net.