Wednesday, November 19, 2008

Brief Overview on IPCs

IPC - Inter Process Communication

Different types of IPC mechanisms
• pipes

• FIFOs (named pipes)

• message queues

• semaphores

• shared memory


Pipes

* Pipes can be used to communicate between related processes i.e, parent and child (or) between two children of a parent. Pipes provide one-way communication of data. Pipe has two ends, one for reading and one for writing.

* Pipe can be created by using int pipe(int *filedes)

* The pipe system call returns two file descriptors filedes[0] for reading and filedes[1] for writing.

* One process writes onto the pipe using the write end fd and the other process reads the pipe by using the read end fd.

* Just to make sure that each process does either writing or reading using the pipes, the corresponding fd is closed in that process.

* For example, if parent writes and child reads, then in parent process, the programmer can close the read fd and in the child process the write fd can be closed.

* For creating a two-way communication, we need to create two pipes.

* Pipes use kernel memory for the actual pipe buffer. Pipe has a finite size which is set to 4096 bytes atleast.

* Disadvantages: Pipes can be used only between related processes. Pipes does not have any entry in the name space because of which it can not be used between two unrelated processes.

Named Pipes (FIFOs)

* The main difference between a FIFO and a normal pipe is that the FIFOs have an entry in the name space, so that it can be used between two unrelated processes.

* FIFO can be created by using int mknod (char *pathname, int mode, int dev)

* Once the FIFO has been created, it needs to be opened for either reading or writing using the open system call.

* Pipes or FIFO follows below rules for reading or writing

* read of data less than is in the pipe or FIFO returns the requested amount and remaining can be read by subsequent reads


* If more data is requested, only the amount available is returned


* If no data (or) no writer on the pipe, the read will return zero


* If two processes write simultaneously ( total less than max limit), then one process data follows another but won't intermix


* If process writes onto pipe and if no process opens it for reading, then SIGPIPE is generated


* unlink system call can be used to remove a FIFO.

* Pipes and FIFOs are called stream oriented IPC mechanisms, since the data that is flowing on the IPC are just stream of bytes and does not have demarkation for any fixed messages. Hence, when one process writes 100 bytes, another process can read these in 20 bytes each for 5 times.

Message Q, Semaphores, Shared Memory

* These 3 are called system V IPCs. These share a commonality. All the three IPCs can be identified by using a key_t (integer)

* System calls that operate these IPCs also are similar.

* replace ipc with msg/sem/shm to get the corresponding system call for each IPC mechanism


get - system call to create or open
ctl - system call to control operations
msgsnd/msgrecv - for send / recv in msgQ
semop - opertions on semaphores
shamat/shmdt - operations on shared memory

Message Queues

* In Message Queues, different processes communicate with each other by means of messages which are predefined and agreed upon by all these processes. MsgQ uses kernel memory and are basically a linked list.

* Each message in the queue is identified by items.
1. message type
2. length of data portion // This is optional
3. data portion

* For receiving a message, int msgrecv(int msgqid, struct msgbuf *buf, int len, long msgtype, int flags) is used.

* The msgtype indicates the type of the message that needs to be read from the Q.

If the msgtype is 0, first message on the Q is returned
If the msgtype is >0, first message with that msgtype on the Q is returned
If the msgtype is <0, style="font-weight: bold;">

Shared Memory


In MsgQs and other mechanisms of IPC, the buffers that are used for communication are mainly in kernel memory. So, the process needs to do mode switching from user/kernel while accessing the kernel memory which makes it slow. In shared memory, the memory is in the user space only and hence no mode switching and hence speeds up.

Semaphores

* These are means of synchronization. Semaphores are nothing but some global counters. We can assume that semaphore is a kind of a global integer which is common to different processes on the system.

* How can we have a global variable which can be common to different processes. Basically, the semaphore will be maintained in the kernel space, so that it can be accessed by different processes.

* Semaphores just work with the same kind of calls as anyother IPC mechanisms, like
- semget ( key, numOfSemaphoreSets , permissionflags)

* In semaphores, we can create more number of resource counters which are associated with the same semaphore key. The second paramter of the semget indicates this. Max value is 25
Permissionflag values can range IPC_CREAT, IPC_EXCLUS along with the normal permissions.

* If we put just IPC_CREAT, then if no semaphore is created with the key specified, then the semaphore will be created. If an already semaphore existed that is created by another process, then that semId is returned.

* If IPC_CREAT and IPC_EXCLUS flags are mentioned, then only if there is no semaphore, it will create one, otherwise, semget fails. Basically, it gives exclusivity.

* We can set the value of a semaphore to any value we want by using semctl system call.

* Semctl ( semId, semNum, Cmd, Args )  second arg is the sub semaphore number
Ex : semctl ( semId, 0, GETVAL )  To get the value of a sempahore
semctl ( semId, 0, SETVAL, 13 )  To set the value of a semaphore to 13

* How to use semaphores ?
It depends on the applications themselves. Let’s say one process creates a semaphore and sets the value to 2 and start using a resource. The protocol between the processes is whenever the semaphore value is 0, the resource is free and the process can access that resource.

* The second process, gets the value of the semaphore by using GETVAL in semctl and checks if it is == 0 , if not , it waits otherwise it will use the resource.

* Even in the above procedure, there is a synchornisation prblem. Process 1 sets the value of the semaphore to 2 and starts using the resource. Once done, it resets the semaphore to 0.

* If there are two processes waiting for this resource, then Process2 gets the value and tries to check the value if it’s zero. Just before the check, if the CPU has gone to Process 3, then it also get the value as 0 and tries to use the resource.

* Once the CPU comes to Process2, then the check passes and it also tries to use the resource which creates again the same synchronisation problem for which the semaphores are designed for.

* To make these things straight, the operation of checking and changing the value of a semaphore should be atomic which can be achived by using a system call “semop” with the help of sembuf structure.

Struct sembuf{

Ushort sem_num
Short semop
Short semflg
}

semop(semId, sembufPtr, numOfSemsInSecondArg )

* The main field in the sembuf structure is semop which will be acted upon the current value of the semaphore.

Let’s take an example to explain this.

If sembuf is assigned with { 1, 0, 0}, that means, we want to operate on the first sub semaphore, and the third argument is kind of not required to discuss at this time.

Let’s see the second argument meaning.

If semop is 0, then the semop will block the execution until the value of the semaphore becomes zero. This is the atomic operation which does get and check of the semaphore value.

If semop is a positive value, then this value will be added to the current value of the semaphore and then the function will immediately return. This is equivalent to SETVAL.

If semop is a negative value, then this value will be decremented from the current value of the semaphore and then the function will return only if the value after decrementing is greater than or equal to zero. Otherwise, this call will block the execution.

No comments: