WO1997050051A2 - Ocr processing of encoded text and communication thereof - Google Patents
Ocr processing of encoded text and communication thereof Download PDFInfo
- Publication number
- WO1997050051A2 WO1997050051A2 PCT/IL1997/000190 IL9700190W WO9750051A2 WO 1997050051 A2 WO1997050051 A2 WO 1997050051A2 IL 9700190 W IL9700190 W IL 9700190W WO 9750051 A2 WO9750051 A2 WO 9750051A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- symbols
- characters
- readily distinguished
- ocr
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
Definitions
- the present invention relates to apparatus and techniques for scrambling of written text.
- OCR optical character recognition
- the present invention seeks to provide apparatus and techniques for scrambling written text such as to avoid ambiguities which could make optical character recognition difficult.
- apparatus for encoding written text including an encoder for encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus and apparatus for providing a hard copy output of the encoded form.
- the apparatus also includes a scanner for receiving a hard copy written text and supplying an electronic output thereof to the encoder.
- the apparatus also includes a symbol conversion table employed by the encoder for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
- the symbols which are readily distinguished from each other by OCR apparatus consist of alias symbol sequences.
- the encoder also carries out scrambling of the text.
- scrambling is deemed to include encryption.
- the encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text.
- the encoded form includes a set of different symbols which is a subset of a set of different symbols contained in the non-encoded written text.
- the apparatus includes an error control code generator for generating error control codes for said encoded text, and an inserter for inserting said error control codes into said encoded text.
- the apparatus includes a divider for dividing said encoded text into blocks and for inserting end-of-block markers between said blocks.
- apparatus for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus, the apparatus including: a text receiver for receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and a decoder for decoding the text in the encoded form and converting it into a decoded form wherein the subsets are not excluded and wherein the symbols which are not readily distinguished from each other by OCR apparatus appear.
- the receiver comprises a scanner which provides an electronic output to the decoder.
- the apparatus for decoding preferably also includes apparatus for providing a hard copy output of the decoded form.
- the apparatus for decoding also preferably includes a symbol conversion table employed by the decoder for converting certain symbols which are readily distinguished from each other by OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
- the symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
- the decoder also carries out unscrambling of the text.
- the encoded form includes a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text.
- the encoded form preferably includes a set of different symbols which is a subset of a set of different symbols contained in the non-encoded written text.
- a method for encoding written text including: encoding the text into an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and providing a hard copy output of the encoded form.
- the method for encoding written text includes a step of embedding error correction characters into said text.
- the method for encoding written text includes a step of embedding end-of- block identifier characters at regular intervals in said text.
- the method also includes receiving a hard copy written text and supplying an electronic output thereof for encoding.
- the text receiver further includes an end of block counter adapted to identify end-of block markers embedded in said text and to compare a number of characters between each said marker thus determined with a predetermined expected number of characters.
- the text receiver further includes an error correction character identifier which is operable to identify error correction characters embedded in said text and associated with given characters of said text.
- the text receiver further includes an error identifier operable to compare any error correction characters identified by the error correction character identifier with associated characters of the text to identify whether a character recognition error has occurred.
- the text receiver further comprises an error correction character identifier, and means for replacing textual characters on the basis of statistically likely errors until an error correction character identified by said error correction character identifier indicates that no error is present. Further in accordance with a preferred embodiment of the present invention the text receiver further includes a means for replacing textual characters on the basis of statistically likely errors until said error identification means indicates that no error is present.
- the method also preferably includes causing the encoder to employ a symbol conversion table for converting symbols which are not readily distinguished from each other by OCR apparatus to symbols which are readily distinguished from each other by OCR apparatus.
- the symbols which are readily distinguished from each other by OCR apparatus comprises alias symbol sequences.
- the method also preferably includes scrambling of the text.
- a method for decoding encoded text utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus including. receiving a text in an encoded form utilizing a set of symbols which excludes subsets of symbols which are not readily distinguished from each other by OCR apparatus; and decoding the text in the encoded form and converting it into a decoded form wherein the subsets are not excluded and wherein the symbols which are not readily distinguished from each other by OCR apparatus appear.
- the decoding method preferably also includes scanning the written text to provide an electronic output to the decoder and providing a hard copy output of the decoded form.
- the decoding method also preferably includes employing a conversion table for converting certain symbols which are readily distinguished from each other by O 97/50051 PCML97/00190
- OCR apparatus to symbols which are not readily distinguished from each other by OCR apparatus.
- the decoding method additionally preferably includes unscrambling of the text.
- Figs. 1A, IB, IC, ID, IE and IF are illustrations of six examples of subsets of symbols which are not readily distinguished from each other by OCR apparatus,
- Fig 2 is a functional block diagram of apparatus for encoding and decoding written text in accordance with a preferred embodiment of the present invention
- Fig 3 is an illustration of mapping of the individual symbols of the subset of Fig. 1A into sequences of individual symbols which are readily distinguished from each other by OCR apparatus,
- Figs 4A and 4B respectively illustrate a portion of written text, including some of the symbols of the subset of Fig 1A, in unencoded and encoded form respectively,
- Figs 5 A and 5B respectively illustrate a portion of encrypted written text, including some of the symbols of the subset of Fig 1A, in unencoded and encoded form respectively,
- Fig 6 is a functional block diagram of a method of embedding error correction abilities into the text in accordance with an embodiment of the invention.
- Fig 7 is a functional block diagram of a method of reading text produced in accordance with the method of fig 6
- Figs 1A, IB, IC, ID, IE and IF are illustrations of six examples of subsets of symbols which are not readily distinguished from each other by optical character recognition (OCR) apparatus
- OCR optical character recognition
- the subset of symbols of Fig. 1A includes left and right facing bracket symbols, the number 1, an exclamation mark, the small case "1" and a capital I, which can readily be confused with each other by optical character recognition apparatus and techniques.
- FIG. 2 is a functional block diagram of apparatus for encoding and decoding written text in accordance with a preferred embodiment of the present invention.
- a source document 10 typically comprises alphanumeric text which includes a multiplicity of symbols including at least one subset of symbols which may readily be confused with each other by optical character recognition apparatus and techniques. Examples of such subsets are illustrated in Figs. 1A - IF.
- An encoding system 12 encodes the source document 10.
- the encoding system may be any suitable encoding system producing a scrambled or encrypted text and is preferably an encoding system as described in applicant/assignees pending Israel Patent Application 1 15277, the disclosure of which is hereby incorporated by reference. This document describes a method of encrypting all or only selected portions of a document and the reader is referred in particular to pages 5 to 9 thereof. Alternatively, the encoding system may not provide scrambling or encryption but only provide symbol conversion as described hereinbelow. An encoding system of the type described is available from the applicants as Aliroo Private Suite Version 2.13.
- the encoding system 12 is operative to convert alphanumeric characters or other symbols from the source document into a non-ambiguous text, which is normally, but need not necessarily be, scrambled, by employing a symbol conversion table 14, such as that illustrated, for example, in Fig. 3.
- Symbol conversion table 14 is employed for converting alphanumeric characters or other symbols which may be ambiguous, i.e. which may readily be confused with each other by optical character recognition apparatus and techniques, into unambiguous sequences of symbols
- the unambiguous symbols are sequences of an infrequently occurring symbol such as an "&”, hereinafter termed an "alias character” and another symbol, hereinafter collectively termed an "alias character sequence".
- alias character sequence an infrequently occurring symbol such as an "&”
- alias character sequence another symbol
- the encoding system 12 preferably outputs to a printer 16 which produces an encoded printout 18
- encoded printout 18 In the illustrated embodiment, encoded printout 18 may be shown to be scrambled. Alternatively this need not be the case
- the encoded printout 18 is formed of a set of symbols which includes a number of different symbols which is less than the number of different symbols contained in the non-encoded written text
- the encoded printout 20, or a copy thereof, may then be scanned by an OCR device, with or without having been previously transmitted by fax or any other unprotected communications medium
- the OCR device typically includes a conventional scanner 22 which outputs to an OCR interpreter 26 which employs a symbol conversion table 24, which corresponds to symbol conversion table 14 and produces a destination document 28, which may be a hard copy or virtual document
- OCR interpreter 26 and conversion table 24 are operative to replace all of the alias character sequences and to verify the error correction code in the encoded printout 18 by the original characters with a high degree of accuracy in the destination document
- Figs 4A and 4B respectively illustrate a portion of written text, including some of the symbols of the subset of Fig. 1A, in unencoded and encoded form respectively. It is seen that only ambiguous characters 30, 32, 34 and 36 are converted, the remaining characters are unchanged
- FIG. 5A and 5B Operation of the apparatus of Fig. 2 in an encrypting or scrambling mode is illustrated in Figs. 5A and 5B.
- Figs. 5A and 5B respectively illustrate a portion of encrypted written text, including some of the symbols of the subset of Fig. 1A, in unencoded and encoded form respectively. It is seen that all or most of the characters are changed, but that no ambiguous characters 38, 40 and 42 remain in the encoded form
- Fig 6 is a functional block diagram of a method of embedding error correction abilities into the text in accordance with an embodiment of the invention
- error correction codes may be inserted into the written text
- Many forms of error correction code are well-known to the skilled man and can easily be applied to the present circumstances with minimal experimentation
- the encrypted text is divided into blocks of equal length, for example 19 characters and an XOR operation is carried out between all the bytes of the block bit by bit The resultant byte is then added to the block as the twentieth character.
- an end-of-block marker is added to the block as a twenty-first character.
- a suitable character for use as an end of block character is any character that is distinctive in appearance ( to an OCR) and is not likely to appear with much frequency in the body of the text.
- a capital "X" is an appropriate choice.
- Figure 7 shows the decryption process for the version shown in figure 6
- the characters as input from the OCR are first of all divided into blocks using the end of block markers. Then the characters in each block are counted. If the block length is found to be too long (a split error, one character has been misread as two) then pairs of characters are successively replaced by single characters until the error code is matched. If, on the other hand the length is found to be too short (a merge error, two characters have been read as one) then single characters are successively replaced by pairs of characters until the error code is matched. If the length is correct and the error code is still not matched by the characters ( a recognition error, a character has been misread as another character) then single characters are successively replaced by single characters until the error code is matched.
- the single byte error correction system described above will fail in the case of there being two or more errors in the block that compensate each other to match the error code. Furthermore even if multiple errors do not compensate each other and the error is consequently detected the correction algorithm of successively replacing characters will not supply the right correction.
- the problem can be overcome however by using a multiple byte error correction character.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU30458/97A AU3045897A (en) | 1996-06-12 | 1997-06-12 | Text communication via optical character recognition |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL118642 | 1996-06-12 | ||
IL11864296A IL118642A (en) | 1996-06-12 | 1996-06-12 | System for encoding written text |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1997050051A2 true WO1997050051A2 (en) | 1997-12-31 |
WO1997050051A3 WO1997050051A3 (en) | 1998-02-19 |
Family
ID=11068962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL1997/000190 WO1997050051A2 (en) | 1996-06-12 | 1997-06-12 | Ocr processing of encoded text and communication thereof |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU3045897A (en) |
IL (1) | IL118642A (en) |
WO (1) | WO1997050051A2 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4350844A (en) * | 1975-11-20 | 1982-09-21 | Anstalt Europaische Handelsgesellschaft | Encipering- and deciphering apparatus in the form of a typewriter |
US4837737A (en) * | 1985-08-20 | 1989-06-06 | Toshiaki Watanabe | System for detecting origin of proprietary documents generated by an apparatus for processing information such as words, figures and pictures |
US5031215A (en) * | 1988-09-19 | 1991-07-09 | Jose Pastor | Unambiguous alphabet for data compression |
US5243655A (en) * | 1990-01-05 | 1993-09-07 | Symbol Technologies Inc. | System for encoding and decoding data in machine readable graphic form |
US5479507A (en) * | 1994-01-19 | 1995-12-26 | Thomas De La Rue Limited | Copy indicating security device |
-
1996
- 1996-06-12 IL IL11864296A patent/IL118642A/en not_active IP Right Cessation
-
1997
- 1997-06-12 AU AU30458/97A patent/AU3045897A/en not_active Abandoned
- 1997-06-12 WO PCT/IL1997/000190 patent/WO1997050051A2/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4350844A (en) * | 1975-11-20 | 1982-09-21 | Anstalt Europaische Handelsgesellschaft | Encipering- and deciphering apparatus in the form of a typewriter |
US4837737A (en) * | 1985-08-20 | 1989-06-06 | Toshiaki Watanabe | System for detecting origin of proprietary documents generated by an apparatus for processing information such as words, figures and pictures |
US5031215A (en) * | 1988-09-19 | 1991-07-09 | Jose Pastor | Unambiguous alphabet for data compression |
US5243655A (en) * | 1990-01-05 | 1993-09-07 | Symbol Technologies Inc. | System for encoding and decoding data in machine readable graphic form |
US5479507A (en) * | 1994-01-19 | 1995-12-26 | Thomas De La Rue Limited | Copy indicating security device |
Also Published As
Publication number | Publication date |
---|---|
AU3045897A (en) | 1998-01-14 |
IL118642A (en) | 1999-12-31 |
WO1997050051A3 (en) | 1998-02-19 |
IL118642A0 (en) | 1998-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6769061B1 (en) | Invisible encoding of meta-information | |
US5388158A (en) | Secure document and method and apparatus for producing and authenticating same | |
US7107453B2 (en) | Authenticatable graphical bar codes | |
CN106650869B (en) | Information hiding method based on two-dimensional code | |
EP1589471B1 (en) | ID tag, tag reader, ID scrambling and descrambling methods, and tag manager | |
US20080222496A1 (en) | Secure Protection of Biometric Templates | |
US20030145206A1 (en) | Document authentication and verification | |
WO2015194298A1 (en) | Method for concealing secret information, secret information concealing device, program, method for extracting secret information, and secret information extraction device | |
CN105706118A (en) | 2D-code generation method, 2D-code generation device, 2D-code reading method, 2D-code reading device, 2D code, and program | |
US7017182B2 (en) | Method of securely transmitting information | |
US11902417B2 (en) | Computer-implemented method of performing format-preserving encryption of a data object of variable size | |
CN110210270A (en) | Two-dimensional barcode information safety encryption and system and image in 2 D code analytic method and system | |
US20020018561A1 (en) | Data encryption and decryption using error correction methodologies | |
US20020171862A1 (en) | Print sheet original authentication system, printer device and checking device | |
CN105718978B (en) | QR code generation method and device, and decoding method and device | |
GB2314229A (en) | Facsimile file processing | |
EP1668604A1 (en) | A method of preparing a document so that it can be authenticated | |
WO1997050051A2 (en) | Ocr processing of encoded text and communication thereof | |
JP3545782B2 (en) | How to keep confidential documents confidential | |
KR100628814B1 (en) | A voiding forbidden patterns in audio or video data | |
CN113095042A (en) | Character string encryption method, system, device and storage medium | |
EA002213B1 (en) | Method for identifying an image or a document | |
EP0989528A2 (en) | Portable electronic apparatus and message processing method for decoding message formats | |
CN115935299A (en) | Authorization control method, device, computer equipment and storage medium | |
CN117407845A (en) | User identity multiple authentication method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 98502625 Format of ref document f/p: F |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
122 | Ep: pct application non-entry in european phase |