WO1997050051A2 - Ocr processing of encoded text and communication thereof - Google Patents

Ocr processing of encoded text and communication thereof Download PDF

Info

Publication number
WO1997050051A2
WO1997050051A2 PCT/IL1997/000190 IL9700190W WO9750051A2 WO 1997050051 A2 WO1997050051 A2 WO 1997050051A2 IL 9700190 W IL9700190 W IL 9700190W WO 9750051 A2 WO9750051 A2 WO 9750051A2
Authority
WO
WIPO (PCT)
Prior art keywords
text
symbols
characters
readily distinguished
ocr
Prior art date
Application number
PCT/IL1997/000190
Other languages
French (fr)
Other versions
WO1997050051A3 (en
Inventor
Itzhak Pomerantz
Ram Cohen
Original Assignee
Aliroo Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aliroo Ltd. filed Critical Aliroo Ltd.
Priority to AU30458/97A priority Critical patent/AU3045897A/en
Publication of WO1997050051A2 publication Critical patent/WO1997050051A2/en
Publication of WO1997050051A3 publication Critical patent/WO1997050051A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries

Definitions

  • the present invention relates to apparatus and techniques for scrambling of written text.
  • OCR optical character recognition
  • the present invention seeks to provide apparatus and techniques for scrambling written text such as to avoid ambiguities which could make optical character recognition difficult.
  • apparatus for encoding written text including an encoder for encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus and apparatus for providing a hard copy output of the encoded form.
  • the apparatus also includes a scanner for receiving a hard copy written text and supplying an electronic output thereof to the encoder.
  • the apparatus also includes a symbol conversion table employed by the encoder for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
  • the symbols which are readily distinguished from each other by OCR apparatus consist of alias symbol sequences.
  • the encoder also carries out scrambling of the text.
  • scrambling is deemed to include encryption.
  • the encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text.
  • the encoded form includes a set of different symbols which is a subset of a set of different symbols contained in the non-encoded written text.
  • the apparatus includes an error control code generator for generating error control codes for said encoded text, and an inserter for inserting said error control codes into said encoded text.
  • the apparatus includes a divider for dividing said encoded text into blocks and for inserting end-of-block markers between said blocks.
  • apparatus for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus, the apparatus including: a text receiver for receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and a decoder for decoding the text in the encoded form and converting it into a decoded form wherein the subsets are not excluded and wherein the symbols which are not readily distinguished from each other by OCR apparatus appear.
  • the receiver comprises a scanner which provides an electronic output to the decoder.
  • the apparatus for decoding preferably also includes apparatus for providing a hard copy output of the decoded form.
  • the apparatus for decoding also preferably includes a symbol conversion table employed by the decoder for converting certain symbols which are readily distinguished from each other by OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
  • the symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
  • the decoder also carries out unscrambling of the text.
  • the encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text.
  • the encoded form preferably includes a set of different symbols which is a subset of a set of different symbols contained in the non-encoded written text.
  • a method for encoding written text including: encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and providing a hard copy output of the encoded form.
  • the method for encoding written text includes a step of embedding error correction characters into said text.
  • the method for encoding written text includes a step of embedding end-of- block identifier characters at regular intervals in said text.
  • the method also includes receiving a hard copy written text and supplying an electronic output thereof for encoding.
  • the text receiver further includes an end of block counter adapted to identify end-of block markers embedded in said text and to compare a number of characters between each said marker thus determined with a predetermined expected number of characters.
  • the text receiver further includes an error correction character identifier which is operable to identify error correction characters embedded in said text and associated with given characters of said text.
  • the text receiver further includes an error identifier operable to compare any error correction characters identified by the error correction character identifier with associated characters of the text to identify whether a character recognition error has occurred.
  • the text receiver further comprises an error correction character identifier, and means for replacing textual characters on the basis of statistically likely errors until an error correction character identified by said error correction character identifier indicates that no error is present. Further in accordance with a preferred embodiment of the present invention the text receiver further includes a means for replacing textual characters on the basis of statistically likely errors until said error identification means indicates that no error is present.
  • the method also preferably includes causing the encoder to employ a symbol conversion table for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
  • the symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
  • the method also preferably includes scrambling of the text.
  • a method for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus including. receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and decoding the text in the encoded form and converting it into a decoded form wherein the subsets are not excluded and wherein the symbols which are not readily distinguished from each other by OCR apparatus appear.
  • the decoding method preferably also includes scanning the written text to provide an electronic output to the decoder and providing a hard copy output of the decoded form.
  • the decoding method also preferably includes employing a conversion table for converting certain symbols which are readily distinguished from each other by O 97/50051 PCML97/00190
  • OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
  • the decoding method additionally preferably includes unscrambling of the text.
  • Figs. 1A, IB, IC, ID, IE and IF are illustrations of six examples of subsets of symbols which are not readily distinguished from each other by OCR apparatus,
  • Fig 2 is a functional block diagram of apparatus for encoding and decoding written text in accordance with a preferred embodiment of the present invention
  • Fig 3 is an illustration of mapping of the individual symbols of the subset of Fig. 1A into sequences of individual symbols which are readily distinguished from each other by OCR apparatus,
  • Figs 4A and 4B respectively illustrate a portion of written text, including some of the symbols of the subset of Fig 1A, in unencoded and encoded form respectively,
  • Figs 5 A and 5B respectively illustrate a portion of encrypted written text, including some of the symbols of the subset of Fig 1A, in unencoded and encoded form respectively,
  • Fig 6 is a functional block diagram of a method of embedding error correction abilities into the text in accordance with an embodiment of the invention.
  • Fig 7 is a functional block diagram of a method of reading text produced in accordance with the method of fig 6
  • Figs 1A, IB, IC, ID, IE and IF are illustrations of six examples of subsets of symbols which are not readily distinguished from each other by optical character recognition (OCR) apparatus
  • OCR optical character recognition
  • the subset of symbols of Fig. 1A includes left and right facing bracket symbols, the number 1, an exclamation mark, the small case "1" and a capital I, which can readily be confused with each other by optical character recognition apparatus and techniques.
  • FIG. 2 is a functional block diagram of apparatus for encoding and decoding written text in accordance with a preferred embodiment of the present invention.
  • a source document 10 typically comprises alphanumeric text which includes a multiplicity of symbols including at least one subset of symbols which may readily be confused with each other by optical character recognition apparatus and techniques. Examples of such subsets are illustrated in Figs. 1A - IF.
  • An encoding system 12 encodes the source document 10.
  • the encoding system may be any suitable encoding system producing a scrambled or encrypted text and is preferably an encoding system as described in applicant/assignees pending Israel Patent Application 1 15277, the disclosure of which is hereby incorporated by reference. This document describes a method of encrypting all or only selected portions of a document and the reader is referred in particular to pages 5 to 9 thereof. Alternatively, the encoding system may not provide scrambling or encryption but only provide symbol conversion as described hereinbelow. An encoding system of the type described is available from the applicants as Aliroo Private Suite Version 2.13.
  • the encoding system 12 is operative to convert alphanumeric characters or other symbols from the source document into a non-ambiguous text, which is normally, but need not necessarily be, scrambled, by employing a symbol conversion table 14, such as that illustrated, for example, in Fig. 3.
  • Symbol conversion table 14 is employed for converting alphanumeric characters or other symbols which may be ambiguous, i.e. which may readily be confused with each other by optical character recognition apparatus and techniques, into unambiguous sequences of symbols
  • the unambiguous symbols are sequences of an infrequently occurring symbol such as an "&”, hereinafter termed an "alias character” and another symbol, hereinafter collectively termed an "alias character sequence".
  • alias character sequence an infrequently occurring symbol such as an "&”
  • alias character sequence another symbol
  • the encoding system 12 preferably outputs to a printer 16 which produces an encoded printout 18
  • encoded printout 18 In the illustrated embodiment, encoded printout 18 may be shown to be scrambled. Alternatively this need not be the case
  • the encoded printout 18 is formed of a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text
  • the encoded printout 20, or a copy thereof, may then be scanned by an OCR device, with or without having been previously transmitted by fax or any other unprotected communications medium
  • the OCR device typically includes a conventional scanner 22 which outputs to an OCR interpreter 26 which employs a symbol conversion table 24, which corresponds to symbol conversion table 14 and produces a destination document 28, which may be a hard copy or virtual document
  • OCR interpreter 26 and conversion table 24 are operative to replace all of the alias character sequences and to verify the error correction code in the encoded printout 18 by the original characters with a high degree of accuracy in the destination document
  • Figs 4A and 4B respectively illustrate a portion of written text, including some of the symbols of the subset of Fig. 1A, in unencoded and encoded form respectively. It is seen that only ambiguous characters 30, 32, 34 and 36 are converted, the remaining characters are unchanged
  • FIG. 5A and 5B Operation of the apparatus of Fig. 2 in an encrypting or scrambling mode is illustrated in Figs. 5A and 5B.
  • Figs. 5A and 5B respectively illustrate a portion of encrypted written text, including some of the symbols of the subset of Fig. 1A, in unencoded and encoded form respectively. It is seen that all or most of the characters are changed, but that no ambiguous characters 38, 40 and 42 remain in the encoded form
  • Fig 6 is a functional block diagram of a method of embedding error correction abilities into the text in accordance with an embodiment of the invention
  • error correction codes may be inserted into the written text
  • Many forms of error correction code are well-known to the skilled man and can easily be applied to the present circumstances with minimal experimentation
  • the encrypted text is divided into blocks of equal length, for example 19 characters and an XOR operation is carried out between all the bytes of the block bit by bit The resultant byte is then added to the block as the twentieth character.
  • an end-of-block marker is added to the block as a twenty-first character.
  • a suitable character for use as an end of block character is any character that is distinctive in appearance ( to an OCR) and is not likely to appear with much frequency in the body of the text.
  • a capital "X" is an appropriate choice.
  • Figure 7 shows the decryption process for the version shown in figure 6
  • the characters as input from the OCR are first of all divided into blocks using the end of block markers. Then the characters in each block are counted. If the block length is found to be too long (a split error, one character has been misread as two) then pairs of characters are successively replaced by single characters until the error code is matched. If, on the other hand the length is found to be too short (a merge error, two characters have been read as one) then single characters are successively replaced by pairs of characters until the error code is matched. If the length is correct and the error code is still not matched by the characters ( a recognition error, a character has been misread as another character) then single characters are successively replaced by single characters until the error code is matched.
  • the single byte error correction system described above will fail in the case of there being two or more errors in the block that compensate each other to match the error code. Furthermore even if multiple errors do not compensate each other and the error is consequently detected the correction algorithm of successively replacing characters will not supply the right correction.
  • the problem can be overcome however by using a multiple byte error correction character.

Abstract

Apparatus for encoding (12) written text (10) including an encoder for encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus (26) and apparatus for providing a hard copy output (16) of the encoded form. An encoding method and apparatus for decoding are also disclosed.

Description

TEXT COMMUNICATION VIA OPTICAL CHARACTER RECOGNITION
The present invention relates to apparatus and techniques for scrambling of written text.
Apparatus and techniques have been developed for scrambling written text to prevent it from being read by unauthorized persons.
Examples of such apparatus and techniques are described in Israel patent application 115227 of the present applicant/assignee.
A difficulty has been appreciated by the present inventor in scanning scrambled written text using conventional optical character recognition (OCR) techniques. These techniques often employ statistical models of letter combinations in written language as a secondary tool for resolving ambiguities in the scanned text. Scrambled text does not follow most such statistical models.
The present invention seeks to provide apparatus and techniques for scrambling written text such as to avoid ambiguities which could make optical character recognition difficult.
In this specification and claims the terms "scrambling" and "encrypting", and the terms "descrambling" and "decrypting" are respectively synonymous.
There is thus provided in accordance with a preferred embodiment of the present invention apparatus for encoding written text including an encoder for encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus and apparatus for providing a hard copy output of the encoded form.
Preferably the apparatus also includes a scanner for receiving a hard copy written text and supplying an electronic output thereof to the encoder.
Additionally in accordance with a preferred embodiment of the present invention the apparatus also includes a symbol conversion table employed by the encoder for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
Preferably, the symbols which are readily distinguished from each other by OCR apparatus consist of alias symbol sequences.
In accordance with a preferred embodiment of the present invention, the encoder also carries out scrambling of the text. For the purposes of the present specification and claims, scrambling is deemed to include encryption.
Preferably, the encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text.
In accordance with a preferred embodiment of the present invention, the encoded form includes a set of different symbols which is a subset of a set of different symbols contained in the non-encoded written text.
Further in accordance with a preferred embodiment of the present invention the apparatus includes an error control code generator for generating error control codes for said encoded text, and an inserter for inserting said error control codes into said encoded text.
Still further in accordance with a preferred embodiment of the present invention the apparatus includes a divider for dividing said encoded text into blocks and for inserting end-of-block markers between said blocks.
There is also provided in accordance with a preferred embodiment of the present invention apparatus for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus, the apparatus including: a text receiver for receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and a decoder for decoding the text in the encoded form and converting it into a decoded form wherein the subsets are not excluded and wherein the symbols which are not readily distinguished from each other by OCR apparatus appear.
Preferably, the receiver comprises a scanner which provides an electronic output to the decoder.
The apparatus for decoding preferably also includes apparatus for providing a hard copy output of the decoded form.
The apparatus for decoding also preferably includes a symbol conversion table employed by the decoder for converting certain symbols which are readily distinguished from each other by OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
Preferably, the symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
In accordance with a preferred embodiment of the present invention, the decoder also carries out unscrambling of the text.
Preferably, the encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text.
The encoded form preferably includes a set of different symbols which is a subset of a set of different symbols contained in the non-encoded written text.
There is also provided in accordance with a preferred embodiment of the present invention a method for encoding written text including: encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and providing a hard copy output of the encoded form. Further in accordance with a preferred embodiment of the present invention the method for encoding written text includes a step of embedding error correction characters into said text.
Still further in accordance with a preferred embodiment of the present invention the method for encoding written text includes a step of embedding end-of- block identifier characters at regular intervals in said text.
Preferably, the method also includes receiving a hard copy written text and supplying an electronic output thereof for encoding.
Further in accordance with a preferred embodiment of the present invention the text receiver further includes an end of block counter adapted to identify end-of block markers embedded in said text and to compare a number of characters between each said marker thus determined with a predetermined expected number of characters.
Still further in accordance with a preferred embodiment of the present invention the text receiver further includes an error correction character identifier which is operable to identify error correction characters embedded in said text and associated with given characters of said text.
Additionally in accordance with a preferred embodiment of the present invention the text receiver further includes an error identifier operable to compare any error correction characters identified by the error correction character identifier with associated characters of the text to identify whether a character recognition error has occurred.
Moreover in accordance with a preferred embodiment of the present invention the text receiver further comprises an error correction character identifier, and means for replacing textual characters on the basis of statistically likely errors until an error correction character identified by said error correction character identifier indicates that no error is present. Further in accordance with a preferred embodiment of the present invention the text receiver further includes a means for replacing textual characters on the basis of statistically likely errors until said error identification means indicates that no error is present.
The method also preferably includes causing the encoder to employ a symbol conversion table for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
In accordance with a preferred embodiment of the present invention, the symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
The method also preferably includes scrambling of the text.
There is also provided in accordance with a preferred embodiment of the present invention a method for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus, the method including. receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and decoding the text in the encoded form and converting it into a decoded form wherein the subsets are not excluded and wherein the symbols which are not readily distinguished from each other by OCR apparatus appear.
The decoding method preferably also includes scanning the written text to provide an electronic output to the decoder and providing a hard copy output of the decoded form.
The decoding method also preferably includes employing a conversion table for converting certain symbols which are readily distinguished from each other by O 97/50051 PCML97/00190
OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
The decoding method additionally preferably includes unscrambling of the text.
The present invention will be understood and appreciated more fully from the following detailed description, taken in conjunction with the drawings in which
Figs. 1A, IB, IC, ID, IE and IF are illustrations of six examples of subsets of symbols which are not readily distinguished from each other by OCR apparatus,
Fig 2 is a functional block diagram of apparatus for encoding and decoding written text in accordance with a preferred embodiment of the present invention;
Fig 3 is an illustration of mapping of the individual symbols of the subset of Fig. 1A into sequences of individual symbols which are readily distinguished from each other by OCR apparatus,
Figs 4A and 4B respectively illustrate a portion of written text, including some of the symbols of the subset of Fig 1A, in unencoded and encoded form respectively,
Figs 5 A and 5B respectively illustrate a portion of encrypted written text, including some of the symbols of the subset of Fig 1A, in unencoded and encoded form respectively,
Fig 6 is a functional block diagram of a method of embedding error correction abilities into the text in accordance with an embodiment of the invention, and
Fig 7 is a functional block diagram of a method of reading text produced in accordance with the method of fig 6
Reference is now made to Figs 1A, IB, IC, ID, IE and IF, which are illustrations of six examples of subsets of symbols which are not readily distinguished from each other by optical character recognition (OCR) apparatus The subset of symbols of Fig. 1A, includes left and right facing bracket symbols, the number 1, an exclamation mark, the small case "1" and a capital I, which can readily be confused with each other by optical character recognition apparatus and techniques.
Similarly the symbols in each of the subsets of Figs. IB - IF may be confused with each other. It is appreciated that Figs. 1A - IF are merely exemplary of such subsets and that additional subsets having similar characteristics may exist.
Reference is now made to Fig. 2, which is a functional block diagram of apparatus for encoding and decoding written text in accordance with a preferred embodiment of the present invention. A source document 10 typically comprises alphanumeric text which includes a multiplicity of symbols including at least one subset of symbols which may readily be confused with each other by optical character recognition apparatus and techniques. Examples of such subsets are illustrated in Figs. 1A - IF.
An encoding system 12 encodes the source document 10. The encoding system may be any suitable encoding system producing a scrambled or encrypted text and is preferably an encoding system as described in applicant/assignees pending Israel Patent Application 1 15277, the disclosure of which is hereby incorporated by reference. This document describes a method of encrypting all or only selected portions of a document and the reader is referred in particular to pages 5 to 9 thereof. Alternatively, the encoding system may not provide scrambling or encryption but only provide symbol conversion as described hereinbelow. An encoding system of the type described is available from the applicants as Aliroo Private Suite Version 2.13.
In accordance with a preferred embodiment of the present invention, the encoding system 12 is operative to convert alphanumeric characters or other symbols from the source document into a non-ambiguous text, which is normally, but need not necessarily be, scrambled, by employing a symbol conversion table 14, such as that illustrated, for example, in Fig. 3. Symbol conversion table 14 is employed for converting alphanumeric characters or other symbols which may be ambiguous, i.e. which may readily be confused with each other by optical character recognition apparatus and techniques, into unambiguous sequences of symbols
In accordance with a preferred embodiment of the invention, the unambiguous symbols are sequences of an infrequently occurring symbol such as an "&", hereinafter termed an "alias character" and another symbol, hereinafter collectively termed an "alias character sequence". In such a case, if the alias character actually occurs in the source document, it also is converted to a sequence of the alias character and another symbol
For example, in the example of Fig 3, the symbol & in the source document could be converted to "&R"
The encoding system 12 preferably outputs to a printer 16 which produces an encoded printout 18 In the illustrated embodiment, encoded printout 18„is shown to be scrambled. Alternatively this need not be the case In accordance with a preferred embodiment of the present invention, the encoded printout 18 is formed of a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text
The encoded printout 20, or a copy thereof, may then be scanned by an OCR device, with or without having been previously transmitted by fax or any other unprotected communications medium The OCR device typically includes a conventional scanner 22 which outputs to an OCR interpreter 26 which employs a symbol conversion table 24, which corresponds to symbol conversion table 14 and produces a destination document 28, which may be a hard copy or virtual document
OCR interpreter 26 and conversion table 24 are operative to replace all of the alias character sequences and to verify the error correction code in the encoded printout 18 by the original characters with a high degree of accuracy in the destination document
Operation of the apparatus of Fig 2 in a non-encrypting and non- scrambling mode is illustrated in Figs 4A and 4B Figs 4A and 4B respectively illustrate a portion of written text, including some of the symbols of the subset of Fig. 1A, in unencoded and encoded form respectively. It is seen that only ambiguous characters 30, 32, 34 and 36 are converted, the remaining characters are unchanged
Operation of the apparatus of Fig. 2 in an encrypting or scrambling mode is illustrated in Figs. 5A and 5B. Figs. 5A and 5B respectively illustrate a portion of encrypted written text, including some of the symbols of the subset of Fig. 1A, in unencoded and encoded form respectively. It is seen that all or most of the characters are changed, but that no ambiguous characters 38, 40 and 42 remain in the encoded form
Fig 6 is a functional block diagram of a method of embedding error correction abilities into the text in accordance with an embodiment of the invention
In general, as discussed above, optical character readers are known not to be able to read text with 100% accuracy. Even with replacement of the more ambiguous characters as above, total accuracy is not guaranteed. Statistical models of letter combinations cannot be used for operating on the scrambled text but for descrambling to be possible reading by the OCR has to be 100% accurate Therefore, as shown in figure 6, error correction codes may be inserted into the written text Many forms of error correction code are well-known to the skilled man and can easily be applied to the present circumstances with minimal experimentation In the present embodiment the encrypted text is divided into blocks of equal length, for example 19 characters and an XOR operation is carried out between all the bytes of the block bit by bit The resultant byte is then added to the block as the twentieth character.
In the version of the embodiment shown an end-of-block marker is added to the block as a twenty-first character. A suitable character for use as an end of block character is any character that is distinctive in appearance ( to an OCR) and is not likely to appear with much frequency in the body of the text. A capital "X" is an appropriate choice.
Figure 7 shows the decryption process for the version shown in figure 6
In this process the characters as input from the OCR are first of all divided into blocks using the end of block markers. Then the characters in each block are counted. If the block length is found to be too long ( a split error, one character has been misread as two) then pairs of characters are successively replaced by single characters until the error code is matched. If, on the other hand the length is found to be too short ( a merge error, two characters have been read as one) then single characters are successively replaced by pairs of characters until the error code is matched. If the length is correct and the error code is still not matched by the characters ( a recognition error, a character has been misread as another character) then single characters are successively replaced by single characters until the error code is matched.
The successive replacement of characters is not carried out at random. Rather a set of statistics of the most likely errors is used to replace characters by their likely alternatives. This applies both to recognition errors and to split/merge errors.
This is followed by a stage of removing the error control characters and end of block markers, at which point decryption can proceed in exactly the same way as described above.
The single byte error correction system described above will fail in the case of there being two or more errors in the block that compensate each other to match the error code. Furthermore even if multiple errors do not compensate each other and the error is consequently detected the correction algorithm of successively replacing characters will not supply the right correction. The problem can be overcome however by using a multiple byte error correction character.
It is pointed out that the error correction character is as much subject to the correction algorithm as any other character.
It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the present invention is defined only by the claims which are:

Claims

C L A I M S 1. Apparatus for encoding written text comprising: an encoder for encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and apparatus for providing a hard copy output of the encoded form.
2. Apparatus according to claim 1 and also comprising a scanner for receiving a hard copy written text and supplying an electronic output thereof to said encoder.
3. Apparatus according to claim 1 and also comprising a symbol conversion table employed by said encoder for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
4 Apparatus according to claim 1 and wherein said encoder also carries out scrambling of said text.
5. Apparatus according to claim 1 and wherein said encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text
6. Apparatus according to claim 3 and wherein said encoder also carries out scrambling of said text.
7. Apparatus for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus, the apparatus comprising: a text receiver for receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and a decoder for decoding the text in said encoded form and converting it into a decoded form wherein said subsets are not excluded and wherein said symbols which are not readily distinguished from each other by OCR apparatus are represented by symbols which are readily distinguished from each other.
8. Apparatus according to claim 7 and wherein said symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
9. Apparatus according to claim 7 and wherein said decoder also carries out unscrambling of said text.
10. A method for encoding written text comprising: encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and providing a hard copy output of the encoded form.
11. A method according to claim 10 and also comprising causing said encoder to employ a symbol conversion table for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
12. A method according to claim 10 and wherein encoding also includes scrambling of said text.
13. A method for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus, the method comprising: receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and decoding the text in said encoded form and converting it into a decoded form wherein said subsets are not excluded and wherein said symbols which are not readily distinguished from each other by OCR apparatus are represented by symbols which are readily distinguished from each other.
14. A method according to claim 13 and also including scanning the written text to provide an electronic output to said decoder.
15. A method according to claim 13 and also comprising providing a hard copy output of the decoded form.
16. A method according to claim 13 and also comprising employing a conversion table for converting certain symbols which are readily distinguished from each other by OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
17. A method according to claim 13 and wherein said symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
18. A method according to claim 14 and wherein said symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
19. A method according to claim 12 and wherein said decoding also includes unscrambling of said text.
20. A method according to claim 14 and wherein said decoding also carries out unscrambling of said text.
21. Apparatus according to claim 1 further comprising an error control code generator for generating error control codes for said encoded text, and an inserter for inserting said error control codes into said encoded text
22. Apparatus according to claim 1 further comprising a divider for dividing said encoded text into blocks and for inserting end-of-block markers between said blocks.
23 Apparatus according to claim 7, wherein said text receiver further comprises an end of block counter adapted to identify end-of block markers embedded in said text and to compare a number of characters between each said marker thus determined with a predetermined expected number of characters.
24. Apparatus according to claim 7 wherein said text receiver further comprises an error correction character identifier which is operable to identify error correction characters embedded in said text and associated with given characters of said text.
25 Apparatus according to claim 24 wherein said text receiver further comprises an error identifier operable to compare any error correction characters identified by the error correction character identifier with associated characters of the text to identify whether a character recognition error has occurred.
26. Apparatus according to claim 23 wherein said text receiver further comprises an error correction character identifier, and means for replacing textual characters on the basis of statistically likely errors until an error correction character identified by said error correction character identifier indicates that no error is present.
27. Apparatus according to claim 25 wherein said text receiver further comprises a means for replacing textual characters on the basis of statistically likely errors until said error identification means indicates that no error is present.
28 A method according to claim 10 further comprising a step of embedding error correction characters into said text.
29 A method according to claim 10 further comprising a step of embedding end-of-block identifier characters at regular intervals in said text.
30 A method according to claim 12 further comprising the steps of identifying end-of-block and error control characters in said encoded and scrambled text, dividing the text into blocks in accordance with the positions of said end-of-block characters, comparing an actual number of characters in each block with an expected number of characters in each block, determining whether characters in each block conform to an error control character associated with that block, and in the case of non¬ conformity with either said expected number of characters or said error control character, replacing characters on the basis of statistically likely errors until conformity is achieved.
PCT/IL1997/000190 1996-06-12 1997-06-12 Ocr processing of encoded text and communication thereof WO1997050051A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU30458/97A AU3045897A (en) 1996-06-12 1997-06-12 Text communication via optical character recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL118642 1996-06-12
IL11864296A IL118642A (en) 1996-06-12 1996-06-12 System for encoding written text

Publications (2)

Publication Number Publication Date
WO1997050051A2 true WO1997050051A2 (en) 1997-12-31
WO1997050051A3 WO1997050051A3 (en) 1998-02-19

Family

ID=11068962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL1997/000190 WO1997050051A2 (en) 1996-06-12 1997-06-12 Ocr processing of encoded text and communication thereof

Country Status (3)

Country Link
AU (1) AU3045897A (en)
IL (1) IL118642A (en)
WO (1) WO1997050051A2 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4350844A (en) * 1975-11-20 1982-09-21 Anstalt Europaische Handelsgesellschaft Encipering- and deciphering apparatus in the form of a typewriter
US4837737A (en) * 1985-08-20 1989-06-06 Toshiaki Watanabe System for detecting origin of proprietary documents generated by an apparatus for processing information such as words, figures and pictures
US5031215A (en) * 1988-09-19 1991-07-09 Jose Pastor Unambiguous alphabet for data compression
US5243655A (en) * 1990-01-05 1993-09-07 Symbol Technologies Inc. System for encoding and decoding data in machine readable graphic form
US5479507A (en) * 1994-01-19 1995-12-26 Thomas De La Rue Limited Copy indicating security device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4350844A (en) * 1975-11-20 1982-09-21 Anstalt Europaische Handelsgesellschaft Encipering- and deciphering apparatus in the form of a typewriter
US4837737A (en) * 1985-08-20 1989-06-06 Toshiaki Watanabe System for detecting origin of proprietary documents generated by an apparatus for processing information such as words, figures and pictures
US5031215A (en) * 1988-09-19 1991-07-09 Jose Pastor Unambiguous alphabet for data compression
US5243655A (en) * 1990-01-05 1993-09-07 Symbol Technologies Inc. System for encoding and decoding data in machine readable graphic form
US5479507A (en) * 1994-01-19 1995-12-26 Thomas De La Rue Limited Copy indicating security device

Also Published As

Publication number Publication date
AU3045897A (en) 1998-01-14
IL118642A (en) 1999-12-31
WO1997050051A3 (en) 1998-02-19
IL118642A0 (en) 1998-02-22

Similar Documents

Publication Publication Date Title
US6769061B1 (en) Invisible encoding of meta-information
US5388158A (en) Secure document and method and apparatus for producing and authenticating same
US7107453B2 (en) Authenticatable graphical bar codes
CN106650869B (en) Information hiding method based on two-dimensional code
EP1589471B1 (en) ID tag, tag reader, ID scrambling and descrambling methods, and tag manager
US20080222496A1 (en) Secure Protection of Biometric Templates
US20030145206A1 (en) Document authentication and verification
WO2015194298A1 (en) Method for concealing secret information, secret information concealing device, program, method for extracting secret information, and secret information extraction device
CN105706118A (en) 2D-code generation method, 2D-code generation device, 2D-code reading method, 2D-code reading device, 2D code, and program
US7017182B2 (en) Method of securely transmitting information
US11902417B2 (en) Computer-implemented method of performing format-preserving encryption of a data object of variable size
CN110210270A (en) Two-dimensional barcode information safety encryption and system and image in 2 D code analytic method and system
US20020018561A1 (en) Data encryption and decryption using error correction methodologies
US20020171862A1 (en) Print sheet original authentication system, printer device and checking device
CN105718978B (en) QR code generation method and device, and decoding method and device
GB2314229A (en) Facsimile file processing
EP1668604A1 (en) A method of preparing a document so that it can be authenticated
WO1997050051A2 (en) Ocr processing of encoded text and communication thereof
JP3545782B2 (en) How to keep confidential documents confidential
KR100628814B1 (en) A voiding forbidden patterns in audio or video data
CN113095042A (en) Character string encryption method, system, device and storage medium
EA002213B1 (en) Method for identifying an image or a document
EP0989528A2 (en) Portable electronic apparatus and message processing method for decoding message formats
CN115935299A (en) Authorization control method, device, computer equipment and storage medium
CN117407845A (en) User identity multiple authentication method, system and storage medium

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB

AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 98502625

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase