In essence, Voice Over Internet Protocol (VOIP) is the transmission of speech across IP-based network. VoIP works by encoding voice information into a digital format, which can be carried across IP networks in discrete packets. VoIP has two main advantages over traditional telephony.
With VoIP, the calling user (program or individual) supplies the phone number of a URI (Universal Resource Indicator, a form of URL), which then triggers a set of protocol interactions resulting in the placement of the call. The heart of the call placement process for VoIP is the Session Initiation Protocol (SIP). The Session Initiation Protocol (SIP), defined in RFC 3261, is an application level control protocol for setting up, modifying, and terminating real-time sessions between participants over an IP data network.
Once a called party responds, a logical connection is established between the two parties (or more for a conference call), and voice data may be exchanged in both directions. Figure below illustrates the basic flow of voice data in one direction in a VoIP system. On the sending side, the analog voice signal is first converted into a digital bit stream and then segmented into packets. The packetization is performed, typically, by RTP (Real-time Transport Protocol ). This protocol includes mechanisms for labeling the packets so that they can be reassembled in the proper order at the receiving end, plus a buffering function to smooth out reception and deliver the voice data in a continuous flow. The RTP packets are then transmitted across the Internet or a private internet using the User Datagram Protocol and IP protocols.
fig: VOIP Processing