WO1997014222A1

WO1997014222A1 - Personal audio message processor and method

Info

Publication number: WO1997014222A1
Application number: PCT/US1996/017419
Authority: WO
Inventors: Geoffrey Stern; Gil Wexler
Original assignee: Starbro Communications Inc; Geoffrey Stern; Gil Wexler
Priority date: 1995-10-13
Filing date: 1996-10-11
Publication date: 1997-04-17
Also published as: AU7527696A

Abstract

A portable device is disclosed which permits the user to record, edit, play and review voice messages and other audio material which may be received from, and subsequently transmitted to, a remote voice processing or interactive voice response (IVR) host computer over a communication link. A preferred device contains its own power source, integrated circuitry and control buttons to permit the localized recording, editing, storage and playback of audio signals through a built-in speaker (28), microphone (20), and removable memory card (14). The device also contains a standard RJ-11 telephone jack (30), modem chip set (24), and DTMF tone decoder to permit the transmission and control of audio signals to and from a host computer. The device contains circuitry which permits it to transmit and receive audio signals at a rate substantially faster than originally recorded.

Description

PERSONAL AUDIO MESSAGE PROCESSOR AND METHOD

Field of the Invention

The present invention relates generally to dictation devices and more particularly, concerns a method and portable apparatus for audio communication, including the recording and editing of voice mail and its transmission and reception over common electrical communication media or data links. Background of the Invention

All electronic message systems, with the exception of voice-mail, have intermediate devices or storage media whereby data may be transferred, preferably at a high transmission rate, over a standard communication link and stored in a storage medium or onto an unattended device for later off-line access, review and editing by the intended user.

In the case of a facsimile transmission, an image is scanned by the transmitter and then transmitted and ultimately printed at a remote site for off-line utilization by the intended receiver. In the case of electronic mail, data is generated on a computer and then transmitted and stored either directly on the intended user's unattended computer or on a central host computer for subsequent retrieval by the intended user. When the intended user accesses his computer, either the E-mail is already resident or he finds a message indicating that he has mail and explaining how he can retrieve it. Once the E-mail is retrieved, it likewise may be read, reviewed and manipulated by the intended user off-line. Similarly, utilities exist for both facsimile and E-mail messages whereby messages may be selected from a host by an authorized user for subsequent transmission to the user's E-mail address or unattended facsimile machine.

For both E-mail and facsimile, the use of a telephone link is limited to the transmission of the data and the transmission of control codes for that data.

In sharp contrast, voice messages and voice-text are currently recorded by the sender and retrieved by the intended recipient only in real-time. A voice mail user is limited to using a telephone handset or speaker-phone to record and listen to voice messages directly. There exist no devices to store voice messages and likewise there exists no method or utility to scan and select personal voice messages or public announcements from a host for subsequent high speed transmission to a device for subsequent off-line review by the user.

Voice messaging, limited to on-line and real-time transmission and physically requiring access to a telephone set is unfortunate, particularly because voice communication inherently does not require any external hardware or instrumentation other than a mouth and an ear for a human being to create or access it. Speech is the most natural and sufficient form of communication. Speech is hands-free requiring neither writing instrument, keyboard, screen, dedicated vision or hand-to-eye coordination on the part of the user to input or retrieve it. That voice mail is nonetheless so widely used is more a function of speech's unique characteristics than a vote of approval on the adequacy of the current technology.

Since speech is a direct record of the user's voice, the urgency, meaning and emotional content is never lost. Similarly, since so much data is first generated in voice and is only later transcribed to text or data, info-text should be the preferred medium for timely data on meetings, speeches and radio broadcasts. Ideally, voice mail should be the preferred mode of communication when traveling, when communicating through time-zones and when accessing timely information which originated in the spoken word (e.g. minutes of a meeting or lecture). Voice text (i.e. data or text which is spoken by a computer) should be the preferred format for messaging information to be accessed where use of motor skills and vision are not convenient or are impaired such as when driving, operating equipment or engaged in a leisure activity.

The current use of a telephone to access voice messages directly has significantly limited the potential utilization of voice messaging. Real-time transmission of voice messages and info-text makes the recording and retrieval of voice mail, especially from long distances, very costly. The cost and inconvenience involved means that one cannot compose and review voice mail and info-text in a cost efficient manner and at one's own pace. One is limited to a location and situation in which a telephone is accessible and, in the case of a wireless communication link, to a place where wireless transmission is both possible and desirable. In its present state, voice mail is limited to short messages between individuals wishing to communicate in a more substantive fashion at another time (telephone tag). Voice "mail" becomes limited to voice "messaging" because of the cost and inconvenience to both the sender and receiver of listening to lengthy, content-rich "mail" over the phone. Furthermore, the cost of transmitting audio signals in real-time and only when the user has access to a telephone (as opposed to un-attended recording at off-peak hours) makes more commercial use of info text (recorded instructions, recorded travelogues, speech transcripts, articles or books on "tape" etc.) and other innovative advertiser/promotional supported uses of voice-text unfeasible.

Broadly, it is an object of the present invention to provide a dictation device and method which enable a user to compose and review voice mail off-line, from any location, while engaged in any activity, at a leisurely pace, without incurring telephone toll charges and whether a communication link is presently accessible or not.

It is also an object of the present invention to use a telephone link primarily as a communications link for high speed transmission of pre-recorded material and control codes to facilitate that transmission, thereby limiting the use of a telephone and telephone line for voice messaging as a recording or playback device.

It is also an object of the present invention to provide a protocol whereby pre-message handshaking occurs between the dictation device and a host computer to conform the digitized voice signal to one of the standard voice compression protocols to facilitate high speed transmission of voice messages.

The invention provides a low-cost, portable recording and playback dictation device which permits the user to record, edit, play and review voice messages including audio-text, text-to-speech and other audio material which may be received from, and subsequently transmitted to, a remote voice processing or interactive voice response (IVR) host computer over a communication link, such as the public switched telephone system. A preferred device contains its own power source, integrated circuitry and control buttons to permit the localized recording, editing, storage and playback of audio signals through a built-in speaker, microphone and removable memory card. The device also contains a standard RJ-11 telephone jack, modem chip set and DTMF tone decoder to permit the transmission and control of audio signals to and from a host computer. The device contains circuitry which permits it to transmit and receive audio signals at a rate substantially faster than originally recorded.

The invention also relates to a method and software utility which enables the user to control the delivery and/or transmission of audio data between the device and a host computer by a simple method such as DTMF encoded delivery commands, or spoken instructions, sent by the user and received by and responded to by the voice processing host computer. In response to control codes issued by the user, the utility permits selected voice messages to be retrieved directly into the device or forwarded to the device, which may be located at a remote site and left unattended while connected to a communication link. It is a feature of the present invention that a recording device may be left connected to a communication link and a voice mail user is able to have voice mail forwarded to him at off-peak hours, when telephone rates are lowest and when excess capacity on incoming lines is available. The recording device is programmed to respond to control codes issued by the host voice processor to enable automatic and unattended recording of selected incoming voice messages.

It is a further feature of the present invention that both senders and recipients of voice messages are able to manage voice messages efficiently for transmission to a recording device through a simple method such as DTMF encoded delivery commands, or spoken instructions, provided to the voice processing host computer. A sender of voice mail is able to annotate a voice message with a message file title and then leave a more substantive voice file either in the receiver's voice mail box, on his own system or at a service bureau for the intended recipient to retrieve using a code given in the message file title. The intended recipient can scan his messages by title, date, length and priority level while on the phone and tag those messages he wishes to have transmitted to his (and possibly additional) recording device(s). Similarly a user requesting pre-recorded audio text or audio signals may respond to prompts which enable him to designate any telephone number, and the best calling time, to which the unattended recorded device is or will be connected and to which audio text could be forwarded. In some cases, the user requesting prerecorded audio text or audio signals will have the recording device connected to the very line from which he is making his request, in which case he will have the option to have the requested material transmitted directly to his recording device without dropping carrier.

It is also a feature of the present invention that an interface port such as a standard RJ-ll telephone jack is provided so that the recording device may be connected between a telephone set, computer, cellular phone or personal digital assistant and a communication link to enable the user to select and retrieve voice files while using any of those devices. It is also a feature of the present invention that circuitry is provided for the digital conversion and compression of the analog voice signals recorded in the memory of a dictation device to permit high density storage and high speed transmission of digitized voice. Similarly, circuitry is provided for the analog conversion and natural sounding playback of previously stored or received digitized voice.

It is also a feature of the present invention that there may be provided a public terminal e.g. in a manner similar to an automated teller machine and located at places such as airports where a user could connect his recording device and select voice messages to be retrieved and transmitted directly by the recording device. Brief Description of the Drawing

The foregoing, as well as the other objects, features and advantages of the present invention will be understood more completely from the following detailed description of a preferred embodiment, with reference being had to the accompanying drawing, in which:

Figure 1 is a schematic block diagram of a preferred personal audio message processor embodying the present invention; and

Figures 2-6 are flowcharts illustrating how certain processing is performed in the apparatus of Fig. 1.

Detailed Description

Figure 1 is a schematic block diagram of a presently preferred Personal Voice Server (PVS) system 10 embodying the present invention. PVS system 10 broadly comprises five main parts: a highly integrated voice chip 12; a modem chip 24 coupled to the voice chip; a flash memory 14 coupled to the voice chip; peripherals such as a microphone 20, a speaker 28, a keyboard 18, and a display 16 coupled to the voice chip; and control software operating a control processor in the voice chip. Although the embodying device is referred to as a voice server, it should be clear that it is equally useful for other types of audio, including music, as well as other signals. The voice chip is preferably an EV1008 available from EUROM Flashware Solutions, Inc. of Santa Clara, California, but may be any chip providing similar functions and operation. Voice chip 12 provides integrated voice recording, playback and control, managed by the software in accordance with the invention. An internal controller in the voice chip controls the compression of audio, writing and reading of audio data to/from flash memory 14, control of the external modem chip 24 and a standard UART interface. The sampling rate of the chip is controlled by the software.

Flash memory 14 is configured according to the recording time to be available on PVS 10, preferably up to 12 MB of non-volatile flash memory. In a typical situation audio compression will result in a data bandwidth of about 1 KByte per second (about 7kbps) . In this mode a memory of 1 MByte will provide 1000 seconds of audio for retransmission.

A microphone 20 and speaker 28 are selected based on quality and size.

The software embedded in the PVS system permits multiple operations by it, such as keying in a command, talking to and recording in the PVS, as well as audio compression either by a scheme on the voice chip 12 (about 8 kbps) or directly by the modem chip 24 (by a standard ADPCM algorithm, to enable 2 bit qauntization and about 14 kbps). The compressed data may either be transmitted directly by the modem 24 or stored by the voice chip in the flash memory 14. Alternatively, another software algorithm may be added for compressing the audio by standard form of data compression used by a service bureau.

A controller in the voice chip 12 executes software that is resident in the flash memory 14. The program can be updated by automatic reloading of the program over a telephone line over which the audio is sent. The core part of the software that handles the modem and the handshake communication is embedded in the kernel program, and the system address base is configured to drive display 16, preferably an LCD display. The system database is configured to read the keyboard 18. Every time a key of the keyboard is pressed, an interrupt is generated to the controller, which then polls the system data to determine which key was pressed. The software decides what action to perform.

A voice input is provided from microphone 20, through an amplifier 22, to the analog data input of voice chip 12. The controller in the voice chip applies the voice input to an internal ADC (Analog to Digital Converter) , reads the output of the ADC, compresses the output data and causes it to be stored in flash memory 14. The voice chip plays back the data when it receives an appropriate command from the keyboard 18 or directly from a telephone line connected to the receiving machine. The data is read back from the flash memory, decompressed, and converted to an analog signal via a DAC

(Digital to Analog Converter) in the voice chip 12.

The line output of modem chip 24 is connected through a conventional telephone jack 30, such as an RJ-11, which permits connection to a telephone line. However, it will be appreciated that the modem could also be a cellular or other radio transmission modem. The modem is also connected to the UART interface of voice chip 12. Transmission of previously recorded audio is performed by the controller reading data from flash memory 14, which data is previously compressed audio. This data is sent by the modem to the external telephone line for transmission to the receiver. Prior to this, controlling software dials the required number and waits for the receiving telephone to prepare for the transmission. This is done by a known handshake procedure or communications protocol, such as ZMODEM, KERMIT, or X/Y MODEM. After communication is established, the modem transmits all the bits of the digital audio to the receiver by V.34 standard protocol (28.8 Kbps). These audio bits are extracted by the controller from a predefined location in flash memory, under control of the software. The software has a directory of messages in the flash memory which can be retrieved independently by the controller.

While receiving the messages, the modem sends the received stream of bits to the controller of the voice chip. The controller, by means of software manages the pool of messages and free space in the flash memory and stores the data bits in the free space of the memory. This data can be either raw data compressed by an identical voice chip or by other voice chips of audio providers (e.g. A personal computer with audio board and compression software). In parallel with this, audio coming from the microphone may be digitized and stored directly and compressed by the voice chip, or it may be compressed by the modem and then re-captured by the UART of the voice chip. This enables storage of audio which contains compatible ADPCM standard compression, as used by the modem chip and many types of hardware.

Speaker 28 and microphone 20 are connected to the voice chip through amplifiers 22 and 26, respectively.

Flow-diagrams are presented in Figs. 2-6 to describe the reception by and transmission of messages to and from the Personal Voice Server (PVS) and all the different operational options for receiving, storing, retrieving, transmitting and playing messages to and from the PVS. This includes receiving compressed messages in digital form and audio signals in analog form either directly from a microphone or through a telephone connection.

For a more complete understanding of the embodiment of Fig. 1, a complete data sheet for the EV1008 VoiceChip^™ is attached as Appendix A and is incorporated in this description by reference.

Figure 2 is a flowchart illustrating how the PVS receives previously digitized messages from a central message server which is compatible with the communication protocol used by the PVS (this could be any supported de- facto or proprietary standard, e.g. DSVD (Digital Simultaneous Voice and Data) , which is sent via V.34 protocol at 28.8 kbps and which would permit the user to talk while data is being transmitted to his/her PVS.

Operation begins at block 200. At block 210, the modem, in response to a ring, answers the call, completes its handshake procedure, and begins receiving information. Data bits from the modem are received by voice chip 12 at block 212. The voice chip decodes the incoming data at block 214. At block 216, a test performed to determine whether the operation desired to be performed is storing a previously digitized, incoming message. If not, control switches to the process of Fig. 3. Otherwise, operation continues at block 218, where the incoming data packets are stored in flash memory 14. Upon completion of the entire message (block 220), a test is performed at block 222 to determine whether the memory was filled before the complete message was stored. If not, an acknowledgement is sent to the sender (block 224) that the complete message was received, and control returns to block 200. If the memory was filled up, the PVS stops receiving information (block 226), the central server is notified (block 228), and control returns to block 200.

In packetizing data, a large number of bytes of data are separated into many identically structured packets, which are sent together with synchronization and validity information to withstand errors in transmission. In case of a error, only the required packets are re-sent to the PVS, if that is required. The type of transmission and the protocol between the sender and receiver can be any commonly used computer communications protocol, such as z-modem, v.34 or ppp.

Figure 3 is a flowchart which illustrates the process for retrieving previously digitized and compressed messages from the PVS and transmitting them to a central server. At block 300, a test is performed to determine whether the requested service is retrieval and transmission of a previously digitized, compressed and stored message. If not, control is transferred to the procedure of Fig. 4. If so, the requested data is retrieved from memory at block 310. This data is managed by the controller in voice chip 14. The various messages are grouped as a collection of messages and are described in a compact directory which is managed by the controller. The messages are retrieved from the flash memory by going to its starting location, splitting its contents into packets, and sending the data to the modem. The data packets are sent without compression (block 320), since they have been compressed previously, upon storage. The controller loops through the entire pool of messages (block 330) and sends out all messages that need to be transmitted (block 330) , preferably at a substantially higher rate than would be used for normal digital sound, whereupon control is transfer to the procedure of Fig. 2.

It should be appreciated that this mode of operation makes the PVS a particularly useful and convenient device. The user may carry it with him at all times, recording messages as he wishes, when he wishes. Upon arrival at a convenient location, he can then connect to a telephone line and dispatch all of the accumulated messages at once. This process is particularly efficient, because all of the messages are precompressed, organized and packetized by the PVS. Moreover, transmission time and expense are minimized, because the messages are compressed and can be sent in a burst, at the maximum speed of the modem, rather than at the usual transmission rate for digitized voice.

Figure 4 is a block diagram illustrating the routine performed by the controller of voice chip 12 when a received analog audio message is to be recorded. At block 400, a test is performed to determine whether the incoming audio message is from the built-in microphone. If not, control is transferred to the routine of Fig. 5. If so, the audio message is digitized and compressed (block 410) and placed in the working pool of data (block 420). At block 430, a test is performed to determine whether memory was filled before an entire message was stored. If not, the routine is terminated, and control returns to the routine of Fig. 2. If so, recording is disable (block 440), and the operator is notified, as by a warning light, that the memory is full (block 450). Control then reverts to the routine of Fig. 2.

Figure 5 is a block diagram illustrating the routine performed to record analog audio from the telephone line. At block 500, a test is performed to determined whether an audio message being received is from the communications link (telephone line). If not, control is transferred to the routine of Fig. 6. If so, the message is passed through the modem 24 as audio (block 510), and a test is performed at block 520 to determine whether compression is to be performed by the voice chip. If so, the message is stored in local memory

(block 530), recording is stopped, and control is returned to the routine of Fig. 2. If compression is not to be performed by the voice chip, the message is sent to the modem, which compresses it by a standard (ADPCM) algorithm (block 540). The message is then sent back to the voice chip 12 through its UART

(block 550), and the voice chip control that causes the message to be stored in flash memory 14 (block 560). Control is then returned to the routine of Fig. 2.

A number of features described to this point are worthy of note. First of all, the PVS can determine whether an incoming signal on the telephone line is digital or analog. It is the analog messages that are subjected to the routine of Fig. 5. Such messages are then digitized and compressed and are stored in the same form as all other messages. Also, the user has the option of having compression performed by the PVS or the modem. This assures compatibility with the ultimate recipient of a message. For example, the recipient may not have a PVS or other device to reverse the compression performed by the PVS. However, he would be likely to be able to decompress a message compressed by the modem.

Figure 6 is a block diagram of the routine performed by the voice chip controller to play stored audio through the built-in speaker. At block 600, the operator selects a message from the pool of messages stored in the device. At block 610, a test is performed to determine whether stored message to be read was originally compressed by the voice chip. If not, control is transferred to block 620. If so, the message is read and decompressed using the voice chip (block 630), and the decompressed message is applied to the digital-to-analog converter (DAC) in the voice chip (block 640) . The message is then played via the built-in speaker 28 through the amplifier 26 (block 650), and control is returned to the routine of Fig. 2.

If the stored message was not originally compressed by the voice chip, a test is performed at block 620 to determine whether the stored message was originally compressed by the modem chip. If not, the user is notified (block 660), and control is returned to the routine of Fig. 2. If so, the message is read by the controller (block 670), and it is then sent to the modem to be decompressed and then returned from the modem to memory 14 through the UART port of voice chip 12 (block 680). Control is then transferred to block 640, and playback is handled in the same manner as a message originally compressed by the voice chip.

Although a preferred embodiment of the invention has been disclosed for illustrative purposes, those skilled in the art will appreciate that many additions, modifications, and substitutions are possible, without departing from the scope and spirit of the invention as defined in the accompanying claims .

Claims

WHAT IS CLAIMED: 1. An apparatus for communication of audio signals in analog and digital form and for storage of the same, comprising:

digital storage means;

a connection to a communication channel; a modem having an input coupled to said connection and a digital output;

an analog-to-digital converter having an output coupled to said storage means; and

a controller coupled to said storage means and said modem, and comprising:

means for detecting whether a signal on said connection is an analog or digital audio signal;

routing means controlled by said means for detecting and coupled to said modem, said storage means and said analog-to-digital converter, upon said detecting means detecting a digital signal said routing means causing the output of said modem to be coupled to said storage means, upon said detecting means detecting an analog signal said routing means causing said modem to bypass the signal on said connection and coupling the same to said analog-to-digital converter for subsequent storage in said storage means.

2. The apparatus of Claim 1 further comprising the coupling to said storage means being effected through a device which compresses the signal prior to storage.

3. The apparatus of Claim 1 wherein said modem is of the type including means for compressing a digital signal, the coupling to said storage means being effected through a compression router which connects said means for compressing and directs the resulting signal to said storage means.

4. The apparatus of Claim l wherein said modem is of the type including means for compressing a digital signal, said apparatus further comprising a device which compresses a digital signal, the coupling to said storage means being effected through a compression routing means for selectively connecting one of said device and said means for compressing and for directing the resulting signal to said storage means.

5. The apparatus of Claim 1 :

said modem having an analog input and output coupled to said connection and a digital input and output; and

said controller further comprising:

means for assembling digital messages stored in said storage means into a packetized data stream containing data and control bits; and

means for coupling said packetized data stream to the digital input of said modem for transmission over said communication channel.

6. An apparatus as in claim 5 wherein said controller causes said modem to transmit said packetized data stream at a rate that is substantially higher than the transmission rate of digitized voice.

7. An apparatus for communication of audio signals in analog and digital form and for storage of the same, comprising:

digital storage means;

a connection to a communication channel; a modem having an analog input and output coupled to said connection and a digital input and output; and

a controller coupled to said storage means and said modem, and comprising;

means for assembling digital messages stored in said storage means into a packetized data stream containing data and control bits; and means for coupling said packetized data stream to the digital input of said modem for transmission over said communication channel.

8. An apparatus as in claim 7 wherein said controller causes said modem to transmit said packetized data stream at a rate that is substantially higher than the transmission rate of digitized voice.

9. A method for communication of audio signals in analog and digital form over a communication channel and for storage of the same, comprising the steps of:

detecting whether a signal on said channel is an analog or digital audio signal;

upon detecting a digital signal on said channel, storing in a digital storage means the output of a modem, said modem being of the type having an input coupled to said channel and a digital output;

upon detecting an analog signal on said channel, converting the same from analog to digital form and storing the converted signal in a digital storage means.

10. The method of Claim 9 wherein prior to either of said storing steps said signal is compressed.

11. The method of Claim 10 wherein said modem is of the type including means for compressing a digital signal, said signal being compressed by being connected to said means for compressing and the resulting signal being stored in said storage means.

12. The method of Claim 9 performed with a modem of the type having an analog input and output coupled to said channel and a digital input and output and further comprising the steps of:

assembling digital messages stored in said storage means into a packetized data stream containing data and control bits; and coupling said packetized data stream to the digital input of said modem for transmission over said communication channel at a rate that is substantially higher than the transmission rate of digitized voice.

13. A method for communication of audio signals in analog and digital form over a communication channel and for storage of the same, said method being performed with a modem of the type having an analog input and output coupled to said channel and a digital input and output and comprising the steps of:

assembling digital messages stored in a storage means into a packetized data stream containing data and control bits; and

coupling said packetized data stream to the digital input of said modem for transmission over said communication channel at a rate that is substantially higher than the transmission rate of digitized voice.

14. A portable device which permits the user to record, edit, play and review voice messages and other audio material which may be received from, and subsequently transmitted to, a remote aparatus a communication link, comprising:

a receptacle for a power source;

integrated circuitry for localized recording, editing, storage and playback of audio signals powered from said receptacle;

non-volatile storage means, access to which is controlled by said integrated circuitry;

a built-in speaker and microphone coupled with said integrated circuitry for audible playback and local input, respectively, of audio;

a modem chip set coupled with said integrated circuitry;

a modular telephone jack coupled to said modem chip set; the integrated circuitry operating the device so as to transmit and receive audio signals at a rate substantially faster than originally recorded.

15. A device in accordance with claim 14 wherein said integrated circuitry includes a module that is operative to permit distinguishing between analog and digital signals received on the communication link, the analog signals being presented to said integrated circuitry without being processed by said modem chip.