US20140333775A1 - System And Method For Object And Event Identification Using Multiple Cameras - Google Patents
System And Method For Object And Event Identification Using Multiple Cameras Download PDFInfo
- Publication number
- US20140333775A1 US20140333775A1 US14/273,653 US201414273653A US2014333775A1 US 20140333775 A1 US20140333775 A1 US 20140333775A1 US 201414273653 A US201414273653 A US 201414273653A US 2014333775 A1 US2014333775 A1 US 2014333775A1
- Authority
- US
- United States
- Prior art keywords
- camera
- event
- video data
- frame
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G06K9/481—
-
- G06F17/30271—
-
- G06K9/00771—
-
- G06K9/3241—
-
- G06K9/78—
-
- G06T7/2033—
-
- G06T7/2093—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/181—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/44—Event detection
Definitions
- This disclosure relates generally to the field of video monitoring, and, more particularly, to systems and methods for monitoring objects and events using multiple cameras arranged at different angles around a scene.
- Video monitoring systems are widely deployed for various purposes, which include security and public safety.
- one or more cameras are deployed in different locations to monitor activities.
- video monitoring systems generate images of public places, transportation facilities, retail stores, industrial facilities, and residences and other private property.
- the monitoring systems often include data storage devices that archive some or all of the recorded video for later review, and one or more video output devices that enable playback of live and archived video data.
- the cameras generate video data that are monitored by one or more human operators who can view activity in the video and take appropriate action if they view an incident. For example, in a monitoring system at a retail store, the operator views live video of individuals in the store and alerts security personal if an individual attempts to shoplift merchandise.
- multiple cameras record video of a single scene from different positions and angles. While producing video from multiple angles can be helpful in collecting additional detail about a scene, the multiple video recordings are difficult for a human operator to observe in an efficient manner.
- multiple video streams consume large amounts of bandwidth and network resources, particularly in wireless video monitoring systems. Consequently, improvements to video monitoring systems that identify events of interest in recorded video data in an automated manner and that utilize network bandwidth in an efficient manner would be beneficial.
- a video surveillance system includes distributed cameras that communicate with a central processing station.
- the central processing station communicates with multiple cameras that extract foreground objects using background subtraction methods.
- the cameras in our system transmit metadata to the central processing station.
- the metadata corresponding to humans are filtered from that corresponding to other objects.
- the foreground metadata corresponding to people is analyzed by the central processing station to recognize motions and events that are performed by people.
- the cameras communicate with the central processing station using wireless communication network or other suitable communication channels.
- the video surveillance system includes a plurality of cameras located in a plurality of positions to record a scene.
- Each camera includes a sensor configured to generate video data of the scene comprising a series of frames, a first network device configured to transmit the video data and feature vectors associated with the video data to a processing station, and a feature extraction processor operatively connected to the sensor and the network device.
- the feature extraction processor is configured to identify a plurality of feature vectors in video data generated by the sensor, transmit only the plurality of feature vectors to the processing station with the first network device in a first operating mode, and transmit the video data to the processing station with the first network device in a second operating mode only in response to a request for the video data from the processing station.
- the video surveillance system further includes a processing station having a second network device, a video output device, and a processor operatively connected to the second network device and the video output device.
- the processor is configured to receive the plurality of feature vectors generated by each camera in the plurality of cameras with the second network device, identify an object and motion of the object in the scene with reference to the plurality of feature vectors received from at least two of the plurality of cameras, identify an event corresponding to the motion of the object in the scene with reference to a predetermined database of events, generate a request for transmission of the video data from at least one camera in the plurality of cameras, and generate a graphical display of the video data from the at least one camera with the video output device to display the object associated with the event.
- a method for performing surveillance of a scene includes generating with a sensor in a first camera first video data of the scene, the first video data comprising a first series of frames, identifying with a feature extraction processor in the first camera a first plurality of feature vectors in the first video data, transmitting with a network device in the first camera only the first plurality of feature vectors to a processing station in a first operating mode, transmitting with the network device in the first camera the first video data to the processing station in a second operating mode only in response to a request for the first video data from the processing station, generating with another sensor in a second camera second video data of the scene, the second video data comprising a second series of frames and the second camera generating the second video data of the scene from a different position than the first camera, identifying with another feature extraction processor in the second camera a second plurality of feature vectors in the second video data, transmitting with another network device in the second camera only the second plurality of feature vectors to the processing station in the
- FIG. 1 is a schematic diagram of a video monitoring system.
- FIG. 2 is a diagram depicting a pipelined process for identification of events using metadata that are transmitted from multiple cameras viewing a scene.
- FIG. 3 is a diagram of a graph of feature vector nodes for an event that are generated from multiple cameras in a majority-voting configuration.
- FIG. 4 is a diagram of a graph of feature vector nodes for an event that are generated from multiple cameras in a multi-chain configuration.
- FIG. 5 is a diagram of a graph of feature vector nodes for an event that are generated from multiple cameras in a multi-view field configuration.
- FIG. 6 is a set of images of a scene generated by multiple cameras in a surveillance system.
- scene depicts a single area that is monitored by a surveillance system using multiple cameras that are located at multiple positions to view the scene from different directions. Examples of scenes include, but are not limited to, rooms, hallways, concourses, entry and exit ways, streets, street intersections, retail stores, parking facilities and the like.
- the term “sparse encoding” refers to a method for generating data corresponding to a large number of inputs that are encoded as vectors using a plurality of “basis vectors” and “sparse weight vectors.”
- the basis vectors are generated using a penalized optimization process applied to a plurality of predetermined input vectors that are provided during a training process.
- a l 1 optimization process that is known to the art is used to generate the basis vectors and sparse weight vectors that correspond to a plurality of input training vectors.
- the term “sparse” used to refer to a vector or matrix describes a vector or matrix having a plurality of elements where a majority of the elements are assigned a value of zero.
- dimensionality refers to a number of elements in the vector. For example, a row or column vector with three elements is said to have a dimensionality of three, and another row or column vector with four elements is said to have a dimensionality of four.
- the term “metadata” refers to properties of objects that are identified in video or other sensor data. For example, if an object follows a path through a field of view of a video camera, the metadata corresponding to the object optionally include the two-dimensional position of the object in the frames of video data, a velocity of the object, a direction of movement of the object, a size of the object, and a duration of time that the object is present in the field of view of the camera. As described below, events are identified with reference to the observed metadata of an object.
- the metadata do not require that an object be identified with particularity. In one embodiment, the metadata do not identify that an object is a particular person, or even a human being.
- Metadata correspond to a human if the event is similar to an expected human action, such metadata of an object moving at a direction and speed that correspond to a human walking past a camera.
- metadata of an object moving at a direction and speed that correspond to a human walking past a camera.
- individual objects are only tracked for a short time and the metadata do not identify the same object over prolonged time periods.
- the stored metadata and identification of high-interest events due to metadata do not require the collection and storage of Personally Identifiable Information (PII) beyond storage of video data footage for later retrieval.
- PII Personally Identifiable Information
- feature vector or more simply “feature” refer to vectors of metadata that correspond to a distinguishing structure in an object that is identified in video data of the object.
- Each element of the metadata is also referred to as a “feature descriptor” and a feature vector includes a plurality of feature descriptors.
- a feature vector includes data that describe aspects of the human body in the video data including, for example, the size, location, and orientation of the object in the scene. If the video data include multiple humans, then each human can be described using a single feature vector, or each human can be described using multiple feature vectors for different body parts such as the arms, legs, torso, etc.
- the term “dictionary” refers to a plurality of basis vectors that are generated using the sparse encoding process. After the dictionary is generated during the training process, the basis vectors in the dictionary are used to identify a degree of similarity between an arbitrary input vector and the input vectors that were used to generate the basis vectors in the dictionary during the training process. An optimization technique is used to select combinations of basis vectors using a sparse weight vector to generate a reconstructed vector that estimates the arbitrary input vector. An identified error between the reconstructed estimate vector and the actual input vector provides a measure of similarity between the input vector and the dictionary.
- key-frame refers to an image frame in a video sequence of a motion performed by a person or other object in a scene that is considered to be representative of the overall motion.
- a video sequence of a motion typically includes two or more key-frames, and a training process that is described in more detail below includes identification of a limited number of N key-frames in the video sequence.
- Each video sequence of a particular event includes the same number of N key-frames, but the time at which each key-frame occurs can vary depending upon the angle of the video sequence and between different video sequences that are used as training data.
- An event of interest that is recorded from one or more angles during a training process includes a series of frames of video data.
- a video sequence that depicts a person standing up from a sitting position is an event
- Annotators identify key-frames in the video sequence of the person standing in the video streams from multiple cameras that are positioned around the person.
- An event processor or another suitable processing device then extracts features from the identified key-frames to identify a sequence of feature vectors corresponding to the event.
- a training set of multiple video sequences that depict the same event performed by one or more people or objects from different viewing angles form the basis for selecting key-frames in each of the video sequences.
- the features that are extracted from the key-frames selected in video sequences in the training data form the basis for the dictionary that is incorporated into a database for the identification of similar motions performed by other people or objects in different scenes that are monitored by a video surveillance system.
- synchronization frame refers to a frame of video data that is generated in a camera and that contains features that are extracted by a feature extraction processor in the camera to form a full feature vector.
- a full feature vector includes all of the data corresponding to the identified features in the frame of video data.
- the video data in subsequent image frames captures the movement, and the feature extraction processor generates sparse feature vectors that include only changes in the identified feature relative to previous frames that include the feature, such as the synchronization frame.
- video cameras generate synchronization frames at regular intervals (e.g. once every 60 frames of video data).
- Feature vector extraction techniques that are known to the art include, but are not limited to, dimensionality reduction techniques including principal component analysis, edge detection, and scale-invariant feature transformations.
- an identified object in a scene is encoded with a Histogram of Oriented Gradients (HOG) appearance feature descriptor.
- HOG Histogram of Oriented Gradients
- the key-frames of video data occur at particular times during an event of interest and are not necessarily aligned with the generation of synchronization and intermediate frames during operation of a camera. Consequently, a key-frame of video data that is generated during an event of interest can be captured with a synchronization frame or intermediate frame in a camera.
- FIG. 1 depicts a video monitoring system 100 that is configured to record video data about objects in a scene and to display selected video for additional analysis by human operators.
- the video monitoring system 100 includes a processing station 160 and a plurality of cameras 108 A- 108 N that are each positioned to record a scene 112 from different locations and angles.
- the processing station 160 further includes a video, object feature, and event processor 104 , object and feature database 106 , network device 164 , and a video output device 168 .
- the network device 164 is a wired or wireless data networking adapter
- the video output device 168 includes one or more display screens, such as LCD panels or other suitable video display devices.
- the feature processor 104 in the processing station 160 includes one or more digital processors such as central processing units (CPUs), graphical processing units (GPUs), digital signal processors (DSPs), application specific integrated circuits (ASICs), and the like that are configured to execute stored program instructions to process both feature and event data that are received from the cameras as well as video data that are received from the cameras.
- the processor 104 further includes one or more memory devices that store programmed instruction data for execution of one or more software programs with the processor 104 .
- the processor 104 is operatively connected to the database 106 , network device 164 , and video output device 168 .
- the processing station 160 receives feature vector data and optionally video data from the cameras 108 A- 108 N with the network device 164 .
- the processor 104 in the processing station 160 identifies objects of interest and events of interest through synthesis of the feature vector data from one or more of the cameras 108 A- 108 N in conjunction with predetermined feature vectors and event data that are stored in that are stored in the trained object features and event database 106 .
- the trained object features and event database 106 stores the dictionary of the training data.
- the training data are generated during a training phase for the system 100 , and the feature basis vectors in the dictionary for key-frames that correspond to different portions of an event are typically not generated from the same objects that move through the scene 112 and are often recorded by a different set of cameras in a location other than the scene 112 .
- the system 100 removes the background of the scene and rescales identified objects to identify feature vectors for new objects in the scene 112 that are independent of the particular features of the scene 112 and are not overly dependent upon the characteristics of an individual person or object that was not part of the training process.
- the event processor 104 uses the stored dictionary of feature vectors in the database 106 to identify events based on the motion of objects that were not used during the training process in scenes that correspond to locations other than the location used during the training process.
- the trained object features and event database 106 stores data corresponding to a plurality of predetermined features that are associated with previously identified objects and sequences of feature movements that are associated with previously identified events.
- the database 106 stores feature vector data corresponding to the identified shapes of humans and other objects that are present in the scene 112 and are recorded by the video cameras 108 A- 108 N.
- the feature data can include the same feature as viewed from different angles and positions around the scene in the angles corresponding to the viewing angles and positions of the video cameras 108 A- 108 N.
- the even data include predetermined sequences of movements for one or more identified features of one or more objects in the scene.
- the event data in the database 106 can include a sequence of features that correspond to a person who is walking.
- the database 106 is implemented using one or more non-volatile and volatile digital data storage devices including, but not limited to, magnetic hard drives, optical drives, solid state storage devices, static and dynamic random access memory (RAM) devices, and any other suitable digital data storage device.
- non-volatile and volatile digital data storage devices including, but not limited to, magnetic hard drives, optical drives, solid state storage devices, static and dynamic random access memory (RAM) devices, and any other suitable digital data storage device.
- the cameras 108 A- 108 N record video image data of the scene 112 , identify feature data corresponding to objects in the recorded video, and transmit a portion of the feature data and video data to the event processor 104 .
- each of the cameras includes a sensor 140 , a feature extraction processor 144 , memory 148 , and a network device 152 .
- the sensor 140 includes one or more sensing elements such as a charge-coupled devices (CCDs) or complementary metal oxide semiconductor (CMOS) image sensors that record video of the scene 112 , and the sensor 140 is configured to generate digital image data from the scene 112 in, for example, monochrome, color, or near-infrared.
- CCDs charge-coupled devices
- CMOS complementary metal oxide semiconductor
- the camera includes an infrared sensor for detecting images in the far infrared frequency band.
- the sensor 140 is further integrated with lenses, mirrors, and other camera optical devices that are known to the art.
- the feature extraction processor 144 includes one or more digital processors such as central processing units (CPUs), graphical processing units (GPUs), digital signal processors (DSPs), application specific integrated circuits (ASICs), and the like that are configured to execute stored program instructions to process image data from the sensor 140 and to identify feature vectors for one or more objects in the scene 112 using one or more feature extraction techniques.
- the memory 120 stores program instructions for the feature extraction processor 144 and optionally stores a buffer of video data that the sensor 140 generates during operation of the camera in the memory 148 .
- the processing station 160 optionally generates a request for buffered video data in response to identifying that one of the cameras 108 A- 108 N has recorded an event.
- the network devices 152 in the cameras 108 A- 108 N transmit data to the corresponding network device 164 in the processing station 160 through a wireless data network such as, for example, a wireless local area network (WLAN) or wireless wide area network (WWAN).
- WLAN wireless local area network
- WWAN wireless wide area network
- the cameras 108 A- 108 N optionally include visible, near-infrared or far-infrared illumination sources and the cameras include image intensifiers for low-light operation in some embodiments.
- Each one of the cameras 108 A- 108 N includes the feature extraction processor 144 to perform image processing and feature extraction processing.
- the cameras 108 A- 108 N transmit full feature vector data for objects in the video in synchronization frames that are transmitted at regular intervals.
- the feature data include data vectors that describe one or more features for objects in video data that are generated in each frame.
- the synchronization frame is a frame of video data where a processor in the camera generates full feature data for each feature identified in the frame of video data. Synchronization frames are generated at regular intervals during operation of the camera, and frames of video data that are generated between synchronization frames are referred to as intermediate frames.
- the camera only transmits updates to features using a sparse feature encoding scheme to greatly reduce the amount of data and bandwidth requirements for transmitting updates to the feature vectors to the event processor 104 .
- the event processor 104 in the processing station 160 optionally requests full video data from one or more of the cameras 108 A- 108 N during operation. For example, in response to identification of an even, the processor 104 requests video data from one or more of the cameras 108 A- 108 N and the video output device 168 displays the video for an operator to review. The operator optionally generates additional requests for video from one or more of the other cameras 108 A- 108 N. Thus, in one mode of operation a subset of the cameras 108 A- 108 N transmit full video data to the processor 104 , while other cameras only transmit the feature data and feature update data.
- the memory 120 in each of the cameras 108 A- 108 N include an internal data storage device that is configured to buffer video data for a predetermined time period to enable the processor 104 to request additional video data that are stored in the camera.
- the memory 120 in the camera 108 B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112 .
- the camera 108 B generates and transmits feature vector data for objects that are present in the scene 112 , including moving objects, and transmits the feature vector data to the processor 104 . If an event of interest occurs in the scene 112 , the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108 B retrieves the requested video from the data storage device. Thus, even though the camera 108 B does not transmit full video data to the processor 104 , the processor 104 optionally retrieves video data for selected events of interest in the system 100 .
- the database 106 includes the trained models that are used to identify occurrences of events of interest from the feature vector metadata that the cameras 108 A- 108 N transmit to the central processing station 160 . Training is performed before the system 100 is used to perform surveillance on a scene, and the training process is often performed under controlled conditions at a different location than the location of the scene 112 .
- the central processing station 160 and event processor 104 are configured to perform the training process, while in another embodiment a separate computing system performs the training process and data from the training process are stored in the trained object features and event database 106 for use during operation of the system 100 .
- the training process includes a series of trials where a humans or other object perform motions that correspond to events of interest, and the motions are recorded as video from multiple viewing angles.
- a manual annotation process includes one or more annotators who select a limited number of key-frames from each of the video sequences to assist in generating a trained model for the human or object movements that occur in each event of interest.
- the process of manual selection for key-frames during training includes an easy to use interface. This process is simplified to be done by a mechanical turk worker. The instructions presented to turk-workers for annotating the data to obtain key-frames. While the training process for selecting key-frames is performed manually in one embodiment, the feature extraction process and additional generation of the training dictionary data are performed in an automated manner without human intervention.
- a digital processing device receives key-frames of video data from multiple video sequences of a particular event of interest in the training data.
- the multiple video sequences include videos taken from different positions and angles of a single person or object performing a single motion in an event of interest.
- the multiple video sequences also include recordings of multiple people or objects that perform the motion in an event of interest during multiple trials to improve the breadth and accuracy of the training data.
- Each trial is performed by the subject while he or she faces a different direction and at different locations in the field of view of the cameras.
- the trials are performed using eight different orientations as 0,
- the training process generates a model including appearance feature descriptor parameter templates and deformation parameters for one or more events c using a set of M video sequences that are each generated to depict an occurrence of the event c.
- an event c includes a motion of a human kicking his or her leg
- the training data include M video sequences of the leg kick from one or more human training subjects that are recorded from multiple viewing angles performing the kick.
- the training process uses a scoring function S(p q
- D q , w c ) w c , ⁇ (D q , p q ) , where w c is a vector that includes all the appearance and deformation parameters that the training process refines as part of training the model, and ⁇ (D q , p q ) is the corresponding appearance and deformation energy that corresponds to a particular label p q .
- the video monitoring process needs to not only identify a single event of interest, but identify multiple events of interest and distinguish between the different events of interest.
- the training process uses a one-vs-all learning policy for each event of interest, and generates the model parameters that jointly detect and recognize any particular event of interest given hard negative examples of other events of interest that are generated during the training process.
- the training process uses a support vector machine (SVM) framework that employs the following objective learning function:
- ⁇ 1 and ⁇ 2 are user defined scaling parameters that minimize slack values during optimization of the model.
- the constraint directed to key-frame labeling ⁇ circumflex over (p) ⁇ refers to a cost penalization function, or “loss” function ⁇ (p q , ⁇ circumflex over (p) ⁇ ) where a key-frame label ⁇ circumflex over (p) ⁇ is penalized based on the observed (“ground truth”) key-frame p q that is generated during the training process.
- the non-negative slack term ⁇ q provides additional robustness against violations of the constraint.
- the loss function ⁇ (p q , ⁇ circumflex over (p) ⁇ ) is used during the training process to reflect how well a particular hypothesized label ⁇ circumflex over (p) ⁇ matches the predetermined ground-truth label p q .
- the training process described above generates a model with appearance parameters and deformation parameters that can be used to classify multiple events of interest that are observed at a later time during operation of the surveillance system 100 .
- the training process is susceptible to assigning higher weights for some of the motions in the events of interest, which may result in misclassification for some events of interest.
- the training process estimates a bias V that is associated with different events of interest c.
- the bias data are stored in the database 106 and are used to normalize scores during an event identification process to reduce the likelihood of misclassifying an event of interest.
- FIG. 2 depicts a process 200 for operation of the surveillance system 100 for generation of feature vector data in recorded video and transmission of the feature vector data to a central processing system for identification of objects and events of interest.
- the process 200 takes place after the training process has generated the model parameters for the database 106 corresponding to a predetermined number of events of interest.
- a reference to the process 200 performing an action or function refers to the operation of a processor, including processors in either or both of a camera and a central processing system, to execute programmed instructions to perform the action or function in conjunction with other components in the video monitoring system.
- the process 200 is described in conjunction with the video monitoring system 100 of FIG. 1 for illustrative purposes.
- one or more of the video cameras 108 A- 108 N generate recorded video of the scene 112 and the feature extraction processors 144 in each camera perform background subtraction from the video image data (block 204 ).
- the cameras 108 A and 108 B are depicted for illustrative purposes. Each camera records video image data of the scene 112 from a different position.
- the cameras 108 A and 108 B record video, such as image frames 202 A and 202 B, respectively.
- the feature extraction processor 144 in each camera subtracts portions of the image data that correspond to the static portions of the scene 112 that do not move or change during generation of the video, such as the wall and the floor in the scene 112 .
- the images 206 A and 206 B depict a human in the scene 112 with a black background that represents subtraction of the background image data in the video.
- the background subtraction maintains a dynamic evolving background image of the static scene. This background image evolves slowly to account for small variations in the lighting of the scene during the course of a day, or objects could be placed or removed from the scene.
- the dynamic background image serves as a reference image that is compared against every new image captured by the camera sensor.
- the feature extraction processor 144 in each of the cameras 108 A and 108 B identify the difference of the captured image and the reference image to extract silhouettes of one or more foreground objects. Disjoint foreground silhouettes correspond to different objects or people in the scene, and are assigned a different identification number.
- Process 200 continues as feature extraction processors 144 in each of the cameras 108 A and 108 B extract features from the foreground objects in the image data (block 208 ).
- the intensities of the camera sensor at foreground pixel locations are extracted for each silhouette of the object after subtraction of the background, to form a foreground image for each object.
- the processor in each camera generates a rectangular bounding box of minimum area over the foreground image, and the processor resizes the image region to a predetermined fixed resolution.
- the feature extraction processor 144 generates a grid at a fixed resolution image to form each block in the grid containing the same number of pixels.
- the feature extraction processor 144 identifies image gradients within each grid-block and certain feature vectors are identified in a histogram of the image gradients in each grid-block. Once the individual feature vectors are identified for each block in the grid, the feature vectors are appended to each other to form one large feature vector using, for example, fixed-size array of 5 ⁇ 5 grids with HOG descriptors. Thus, one fixed size feature vector is identified for each foreground object in the image.
- the bounding box containing the foreground image is resized to generate a fixed resolution image. For example, two people of different height and size or at two different distances from the camera can be compared using the feature vectors generated from video of the scene 112 .
- the process of extracting feature vectors on the fixed resolution foreground image provides illumination invariance, scale invariance and some rotational invariance.
- Process 200 continues as each camera compresses and transmits the feature data descriptor vectors to the event processor 104 (block 212 ). Since the poses of people in the scene vary gradually over time, there is a high degree of correlation between their corresponding feature vectors over successive frames.
- the images 210 A and 210 B depict features in the image that the feature extraction processor in each of the cameras 108 A and 108 B encodes for transmission to the processing station 160 .
- the feature extraction processors 144 in the cameras 108 A and 108 B perform the correlation with a compression scheme and only the small updates in the feature vectors over successive frames are compressed and transmitted.
- the feature extraction processors 144 use a sparse-coding framework to compress the feature vector updates.
- the feature extraction processors 144 periodically regenerate full feature vectors during synchronization frames of the video data to account for new objects in the scene 112 and to prevent the buildup of excessive noise errors from the sparse feature vector generation process.
- Advantages of performing the sparse encoding and compression include reductions to the amount of data transmitted to the event processor 104 , and the correlation method tracks each individual person or moving object in the foreground, thereby enabling prediction of the path of movement for the object.
- Each of the cameras 108 A and 108 B transmits the full feature vector data for synchronization frames and sparse feature vector data in the compressed format to the network device 164 in the processing station 160 using the network devices 152 in each camera.
- each of the cameras 108 A and 108 B transmits 800 bytes of data in a 5 ⁇ 5 ⁇ 32 array of feature descriptor data for each object that is identified in a scene during a synchronization frame that transmits full feature descriptor data.
- the sparsity of the feature descriptors enables additional compression of the feature descriptor data.
- the cameras 108 A- 108 B transmit only metadata to the central processing station 160 unless the central processing station 160 generates a request for full video data in response to identifying an event of interest that is viewed by one or both of the cameras 108 A and 108 B.
- using the prior-art H.264 video compression algorithm provides an average bit rate of 64K bytes per image for 640 ⁇ 480 pixel resolution frames of color image data, which is roughly 2 orders of magnitude larger than the feature descriptor data.
- Process 200 continues as the event processor 104 in the processing station 160 receives the compressed feature vector data from the cameras 108 A and 108 B, and decompresses the feature vector data (block 216 ).
- the decompression algorithm is complementary to the compression algorithm presented above if a single wireless camera is communicating with the central processing station. If more than one wireless camera is transmitting data to the central processing station, then a joint decompression scheme is implemented that uses information from one camera to predict the updates for other cameras. During joint decompression, the processing station 160 reconstructs the full feature vector from multiple sparse feature vectors that are generated by two or more of the cameras for an object in the scene 112 . The joint decompression scheme minimizes the error in decompression, when compared to independent decoding of the separate data from each of the cameras.
- Process 200 continues with identification of a person or object in the decompressed feature vector data from the cameras (block 220 ).
- the monitoring system 100 is configured to identify feature vectors that correspond to humans and monitor motions of the humans. Other embodiments are configured to identify the motion of other objects, including motor vehicles or animals other than humans in different configurations. Some foreground feature vectors might correspond to people while others could correspond to other objects (such as cars, animals, bicycles, etc.).
- the feature and event database 106 stores sets of feature vectors that correspond to humans and are generated during a training process for the video monitoring system 100 .
- the event processor 104 filters the feature vectors corresponding to humans in the scene 112 from the other objects using the predetermined training data in the database 106 . In one embodiment, the process of filtering objects to identify humans is performed using an object classifier.
- the event processor 104 is configured to identify particular events that occur when an identified object, such as a human, performs a motion from the metadata received from the cameras 108 A- 108 N.
- the video data include a plurality of frames in which a person 240 in the scene 112 performs a kick.
- the processor 104 performs event recognition of the kick event using the feature vectors that are received from both cameras 108 A and 108 B over a predetermined time period.
- the event recognition process is temporal, since the event occurs over time in multiple feature vectors corresponding to multiple frames of video data, and is multi-view because the feature vectors from multiple cameras record the event from different positions.
- the event recognition process recognizes events from the activities of one or more people even if the events are recorded in different parts of the image, oriented in different directions, and if the events are performed at different rates. Further, the processor identifies the events in real time with reference to all the predetermined events that are stored in the database 106 . If an event of interest is identified, the event processor 104 generates an alert to prompt a human operator to review the event. The processor 104 receives video of the event from one or more of the cameras 108 to provide video playback of the event to the human operator. The processor 104 optionally generates a classification of the event for the human operator to review in addition to viewing the video data. For example, in the example of FIG. 2 , the event processor 104 generates an alert indicating that a person has performed a kick (block 224 ), and optionally generates a request for video of the event for display through the video output device 168 for review by human operators.
- the system 100 includes the trained object features and event database 106 that stores feature vector data that are identified for a plurality of events of interest during a training process.
- the database 106 stores feature vectors that are generated from video recordings of one or more people performing kicks during a training process.
- the recordings of the kicks are referred to as “exemplars,” and the exemplars are typically termed “training data” in the computer vision and machine learning communities.
- Sufficient training data is recorded for each event of category of interest. For example, in one embodiment a training process collects data of multiple subjects performing each of the events of interest over multiple trials.
- some image frames of the event are selected as key-frames.
- a predetermined number of key-frames such as six key-frames, are selected manually from the video data of each trial.
- the key-frames represent the pose/gesture frames that provide maximum information regarding the motion being performed.
- a feature vector is extracted for the person in each key-frame using the sub-systems using the same methods that are described above for feature vector extraction in the process 200 .
- the key-frame feature vectors form the training database.
- the dictionary of events that is generated from the training data is stored with the trained object features and event database 106 .
- each feature vector corresponding to a motion of a single person is compared to the feature vectors that are stored in the event database 106 .
- Two feature vectors might be very similar for single frames of two different motions. For instance, a single frame of a person walking might be indistinguishable from a single frame of a person running.
- the feature vectors of query image sequences are compared with the key-frame sequences for each motion in order to remove ambiguity regarding the motion that is recorded by the monitoring system.
- the information from multiple cameras needs to be fused to remove ambiguity from the feature vector that are generated by multiple cameras in different locations because some motions can be invisible to some camera views due to occlusions. For instance, one arm of a person who is oriented perpendicular to the camera is invisible to this camera, and thus another camera facing the same person would capture his occluded arm.
- the event processor 104 in the processing station 160 uses a graphical model for each event of interest to identify the events over both time and from multiple viewing angles.
- the graphical model formulation is a probabilistic model that captures the interaction between multiple key-frames, across multiple camera views.
- the model includes M key-frames and N camera views, for a total of N ⁇ M nodes in the graph.
- Different configurations of the graph include multiple arrangements of connections between nodes. Each choice of connections has different properties for the identification of events.
- the edges of the graphs encode the time difference between the key-frames for that motion in the temporal edges, and the spatial distance of the foreground object bounding box along the spatial edges.
- FIG. 3 depicts one arrangement of nodes in a graph 300 for identifying events using a majority voting model.
- the majority voting model only the nodes corresponding to each of the cameras are connected together linearly over time.
- the graph 300 of FIG. 3 includes one chain for each camera that observes the event. The inference of the probability of occurrence of any event is performed independently for each chain using a standard method like dynamic programming. Each camera is assigned the same weight in determining if a particular event has been observed in the configuration of FIG. 3 .
- the event processor 104 receives the feature vector data from the cameras 108 A- 108 N and identifies if the feature vector graphs from each camera form a majority indicating that an event has been observed. If the feature vectors graphs from the majority of cameras do not agree that an event has occurred, the event processor 104 identifies that no event has occurred (non-occurrence of the event) and does not request full video from the cameras.
- the nodes for each camera correspond to key-frames that are identified by the cameras are formed as a separate sub-graph.
- a video sequence of frames is represented as D and any particular event of interest is represented as directed graph G, where each node is a key-frame for the event of interest.
- the graph G is represented by a collection of nodes V connected by edges E. The number of the V nodes is expressed as M.
- any given node i ⁇ 1 . . .
- the edges E in the graph specify which pairs of key-frame nodes are constrained to have relations. For example, in FIG. 3 the nodes for each camera are connected to each other by edges in a chain corresponding to a time-based sequence of key-frames that correspond to a particular event of interest, with the key-frames occurring in predetermined sequence.
- the framework is more general and edges in the graph need not be successive. For example, some events can include a variable number of repetitions for a particular motion.
- the graph optionally includes jump edges that form cycles between key-frames for the motion that is repeated one or more times.
- the event processor 104 identifies key-frames and changes of the feature descriptors for an object between key-frames using a deformable key-frame model. In FIG. 3 , the event processor 104 generates a score that corresponds to the likelihood that each graph each graph generates a score
- the event processor 104 identifies a maximum inner product response with the feature vectors at the location p i of the in the video D.
- a deformation weight w u between two frames models the Mahalanobis distance between the pairs of key-frames over time in the model. The parameters for the Mahalanobis distance are generated during the training of the model and are stored in the database 106 .
- FIG. 4 depicts another graph configuration 400 where one camera acts as a root node that is connected across time to the key-frames that are generated by other cameras in the system 100 . All other camera views are connected to the nodes from the selected camera as shown in FIG. 4 . The inference in this case is also done using dynamic programming.
- the configuration of FIG. 4 is referred to as a multi-view chain graphical model.
- the camera 2 is selected as the root node.
- the camera 404 is selected as the reference camera.
- the system 100 uses a single camera as the reference camera, such as a system in which a single camera has higher resolution than the other cameras or for a camera that is positioned in a certain location to identify events in the scene with fewer obstructions than other cameras in the system 100 .
- the system 100 assigns the reference camera dynamically based on the camera that detects a key-frame in an event prior to the other cameras that view the scene.
- the reference camera for one event of interest can be different than the reference camera for another event of interest in the system 100 .
- the camera 404 views an event and generates key-frames for the event, such as key-frames 406 A- 406 M.
- the cameras 408 and 412 generate additional image data for key-frames of the event, such as key-frames 410 A- 410 M and 414 A- 414 M, respectively.
- each of the cameras generates key-frames from an event, but the cameras do not necessarily generate the key-frames at the same time.
- the reference camera 404 generates the key-frame 406 B earlier than the corresponding key-frames 410 B and 414 B from cameras 410 and 412 , respectively.
- the event processor 104 receives the feature data corresponding to each of these key-frames from the cameras and identifies that the key-frames 410 B and 414 B correspond to the key-frame 406 B from the reference camera 404 , even though the key-frames are not generated at exactly the same time.
- the event processor 104 applies a temporal constraint to the frames, which is to say that the event processor 104 identifies that key-frames from different cameras correspond to different views of the same event when the key-frames occur within a comparatively short time period of one another.
- the event processor 104 applies a temporal-weighted scale to key-frames that are generated by the other cameras to identify the likelihood that the key-frames correspond to the same portion of the same event of interest as a key-frame from the reference camera. For example, if the key-frame 410 B occurs within 100 milliseconds of the key-frame 406 B, then the weighted scale assigns a high probability (e.g. 90%) that the two key-frames correspond to each other, while a longer delay of 1 second has a correspondingly lower probability (e.g. 25%) that the two key-frames correspond to one another.
- a high probability e.g. 90%
- the event processor 104 extends the score identification process that is described for single cameras in FIG. 3 to multiple cameras including the reference camera and one or more additional cameras that view an event.
- the graph of FIG. 4 depicts the nodes as key-frames with directed edges that connect the nodes 404 A- 404 N for series of key-frames from the reference camera 404 .
- the graph 400 also includes edges that connect the key-frame nodes from the non-reference cameras 408 and 412 are connected to the corresponding key-frames in the reference camera.
- the event processor 104 identifies the edge connections between key-frames from the different cameras based on the proximity in time between the detection of each of the key-frames from the different cameras that record the same event.
- the event processor 104 In the multi-camera configuration of FIG. 4 the event processor 104 generates scores S(p
- the deformation function ⁇ def changes when key-frame data from multiple cameras are used to generate the score.
- the event processor 104 uses a fixed or “homographic” feature in the scene that is visible to each of the cameras. For example, in many surveillance system configurations all of the cameras have a view of a single ground plane in the scene.
- the ground plane does not necessarily need to be the physical ground on which people walk, but is instead a common geometric plane that serves as a reference point for identifying an object and event when the object is viewed from different angles and locations.
- the event processor 104 identities the homography H l r for a common ground plane between the reference camera r and any of the other cameras l that view the ground plane.
- the homography is a linear transformation that maps pixels, and correspondingly features, in one view of a plane to another, and the event processor 104 uses the homography to identify the distance between objects in the views of different cameras.
- the event processor 104 when the event processor 104 identifies the motions of humans who typically are in contact with the ground plane, the center of the line connecting the bottom corners of the bounding boxes that are formed around the objects in each scene act as a proxy for the 3D location of the objects in the scene.
- the event processor 104 can identify the locations of the same object viewed from different cameras in situations where the ground plane or other homographic elements is within the view of the different cameras and the object remains proximate to the homographic element.
- the multiple cameras maintain a view of the common ground plane and objects on the ground plane for the event processor 104 to identify the positions of objects in the views of the different cameras using the homographic transformation.
- f l (x l ,y l ,1) T on the ground plane in the view of camera l
- the deformation function is modeled as a spring function where the cost to perform a deformation corresponds to an amount of force required to stretch a spring.
- FIG. 6 depicts a graphical representation of the deformation constraints between different views 602 A, 602 B, and 602 C of the same person (person 604 ) in a scene with a common ground plane 608 that is visible to three different cameras.
- the event processor 104 identifies corresponding locations 612 A- 612 C at the bottom center of bounding boxes 606 A- 606 C in the feature descriptor data corresponding to each of the views 602 A- 602 C, respectively.
- the bounding box locations 612 A- 612 C are each in contact with the ground plane 608 for the homographic transformation.
- FIG. 5 depicts another graph 500 where the key-frames that are generated from each of the cameras are connected together.
- every node is connected to its neighbor across space and time as shown in FIG. 5 .
- Inference for this model can be done using iterated dynamic programming, with multiple spatial iterations interspersed between temporal iterations.
- the event processor 104 processes the graph FIG. 5 as a set of graphs similar to FIG. 4 where each camera is treated as the reference camera during one iteration of the processing of the graph 500 .
- the event processor 104 subsequently combines the scores for the different iterations through the graph 500 to identify the most likely event that corresponds to the key-frames from the different cameras.
- the number of iterations is fixed a priori in one embodiment.
- the configuration of FIG. 5 is referred to as a multi-view field where the nodes corresponding to all cameras are connected together.
- the central processing station 160 uses the graphical models described above to process detected key-frames in the feature vector metadata from the cameras 108 A and 108 B using the event processor 104 and the predetermined models in the database 106 to identify particular events of interest, such as the kicking event depicted in FIG. 2 , and to take an action in response to identifying the event of interest.
- the event processor 104 performs an inference operation to identify a “best” event c* and corresponding label p* from the predetermined set of events of interest in the database 106 .
- the “best” event c* refers to the event that has the highest likelihood of occurrence and corresponding non-occurrence of other events given the observed sequences of key-frame metadata from the cameras.
- the label p* refers to a human or machine readable identifier that is associated with the event c* that is determined during the training process and is stored in the database 106 .
Abstract
A system for identifying objects and events of interest uses one or more cameras with image processing capabilities. The system is includes multiple cameras configured to perform image processing of a scene from multiple angles to extract and transmit meta-data corresponding to objects or people in the scene. The cameras transmit the meta-data to a processing station that is configured to process the stream of data over time to detect objects and events of interest to alert monitoring personnel of objects or events in the scene.
Description
- This application claims priority to U.S. Provisional Application No. 61/822,051, which is entitled “SYSTEM AND METHOD FOR OBJECT AND EVENT IDENTIFICATION USING MULTIPLE CAMERAS,” and was filed on May 10, 2013, the entire contents of which are hereby incorporated by reference herein.
- This disclosure relates generally to the field of video monitoring, and, more particularly, to systems and methods for monitoring objects and events using multiple cameras arranged at different angles around a scene.
- Video monitoring systems are widely deployed for various purposes, which include security and public safety. In a typical video monitoring system, one or more cameras are deployed in different locations to monitor activities. For example, video monitoring systems generate images of public places, transportation facilities, retail stores, industrial facilities, and residences and other private property. The monitoring systems often include data storage devices that archive some or all of the recorded video for later review, and one or more video output devices that enable playback of live and archived video data.
- In some monitoring systems, the cameras generate video data that are monitored by one or more human operators who can view activity in the video and take appropriate action if they view an incident. For example, in a monitoring system at a retail store, the operator views live video of individuals in the store and alerts security personal if an individual attempts to shoplift merchandise. In some video monitoring systems, multiple cameras record video of a single scene from different positions and angles. While producing video from multiple angles can be helpful in collecting additional detail about a scene, the multiple video recordings are difficult for a human operator to observe in an efficient manner. Additionally, in networked video monitoring systems, multiple video streams consume large amounts of bandwidth and network resources, particularly in wireless video monitoring systems. Consequently, improvements to video monitoring systems that identify events of interest in recorded video data in an automated manner and that utilize network bandwidth in an efficient manner would be beneficial.
- A video surveillance system includes distributed cameras that communicate with a central processing station. The central processing station communicates with multiple cameras that extract foreground objects using background subtraction methods. The cameras in our system transmit metadata to the central processing station. The metadata corresponding to humans are filtered from that corresponding to other objects. The foreground metadata corresponding to people is analyzed by the central processing station to recognize motions and events that are performed by people. The cameras communicate with the central processing station using wireless communication network or other suitable communication channels.
- In one embodiment, the video surveillance system includes a plurality of cameras located in a plurality of positions to record a scene. Each camera includes a sensor configured to generate video data of the scene comprising a series of frames, a first network device configured to transmit the video data and feature vectors associated with the video data to a processing station, and a feature extraction processor operatively connected to the sensor and the network device. The feature extraction processor is configured to identify a plurality of feature vectors in video data generated by the sensor, transmit only the plurality of feature vectors to the processing station with the first network device in a first operating mode, and transmit the video data to the processing station with the first network device in a second operating mode only in response to a request for the video data from the processing station. The video surveillance system further includes a processing station having a second network device, a video output device, and a processor operatively connected to the second network device and the video output device. The processor is configured to receive the plurality of feature vectors generated by each camera in the plurality of cameras with the second network device, identify an object and motion of the object in the scene with reference to the plurality of feature vectors received from at least two of the plurality of cameras, identify an event corresponding to the motion of the object in the scene with reference to a predetermined database of events, generate a request for transmission of the video data from at least one camera in the plurality of cameras, and generate a graphical display of the video data from the at least one camera with the video output device to display the object associated with the event.
- In another embodiment, a method for performing surveillance of a scene has been developed. The method includes generating with a sensor in a first camera first video data of the scene, the first video data comprising a first series of frames, identifying with a feature extraction processor in the first camera a first plurality of feature vectors in the first video data, transmitting with a network device in the first camera only the first plurality of feature vectors to a processing station in a first operating mode, transmitting with the network device in the first camera the first video data to the processing station in a second operating mode only in response to a request for the first video data from the processing station, generating with another sensor in a second camera second video data of the scene, the second video data comprising a second series of frames and the second camera generating the second video data of the scene from a different position than the first camera, identifying with another feature extraction processor in the second camera a second plurality of feature vectors in the second video data, transmitting with another network device in the second camera only the second plurality of feature vectors to the processing station in the first operating mode, transmitting with the other network device in the second camera the second video data to the processing station in the second operating mode only in response to a request for the second video data from the processing station, receiving with another network device in the processing station the first plurality of feature vectors from the first camera and the second plurality of feature vectors from the second camera, identifying with an event processor in the processing station an object and motion of the object in the scene with reference to the first and second plurality of feature vectors, identifying with the event processor in the processing station an event corresponding to the motion of the object in the scene with reference to a predetermined database of events, generating with the event processor in the processing station a request for transmission of the video data from at least one of the first camera and the second camera, and generating with a video display device a graphical display of video data received from at least one of the first camera and the second camera with the video output device to display the object associated with the event.
-
FIG. 1 is a schematic diagram of a video monitoring system. -
FIG. 2 is a diagram depicting a pipelined process for identification of events using metadata that are transmitted from multiple cameras viewing a scene. -
FIG. 3 is a diagram of a graph of feature vector nodes for an event that are generated from multiple cameras in a majority-voting configuration. -
FIG. 4 is a diagram of a graph of feature vector nodes for an event that are generated from multiple cameras in a multi-chain configuration. -
FIG. 5 is a diagram of a graph of feature vector nodes for an event that are generated from multiple cameras in a multi-view field configuration. -
FIG. 6 is a set of images of a scene generated by multiple cameras in a surveillance system. - For the purposes of promoting an understanding of the principles of the embodiments described herein, reference is made to the drawings and descriptions in the following written specification. No limitation to the scope of the subject matter is intended by the references. The description also includes any alterations and modifications to the illustrated embodiments and includes further applications of the principles of the described embodiments as would normally occur to one skilled in the art to which this document pertains.
- As used herein, the term “scene” depicts a single area that is monitored by a surveillance system using multiple cameras that are located at multiple positions to view the scene from different directions. Examples of scenes include, but are not limited to, rooms, hallways, concourses, entry and exit ways, streets, street intersections, retail stores, parking facilities and the like.
- As used herein, the term “sparse encoding” refers to a method for generating data corresponding to a large number of inputs that are encoded as vectors using a plurality of “basis vectors” and “sparse weight vectors.” The basis vectors are generated using a penalized optimization process applied to a plurality of predetermined input vectors that are provided during a training process. In one embodiment, a l1 optimization process that is known to the art is used to generate the basis vectors and sparse weight vectors that correspond to a plurality of input training vectors. The term “sparse” used to refer to a vector or matrix describes a vector or matrix having a plurality of elements where a majority of the elements are assigned a value of zero. As used herein, the term “dimensionality” as applied to a vector refers to a number of elements in the vector. For example, a row or column vector with three elements is said to have a dimensionality of three, and another row or column vector with four elements is said to have a dimensionality of four.
- As used herein, the term “metadata” refers to properties of objects that are identified in video or other sensor data. For example, if an object follows a path through a field of view of a video camera, the metadata corresponding to the object optionally include the two-dimensional position of the object in the frames of video data, a velocity of the object, a direction of movement of the object, a size of the object, and a duration of time that the object is present in the field of view of the camera. As described below, events are identified with reference to the observed metadata of an object. The metadata do not require that an object be identified with particularity. In one embodiment, the metadata do not identify that an object is a particular person, or even a human being. Alternative embodiments, however, infer that metadata correspond to a human if the event is similar to an expected human action, such metadata of an object moving at a direction and speed that correspond to a human walking past a camera. Additionally, individual objects are only tracked for a short time and the metadata do not identify the same object over prolonged time periods. Thus, the stored metadata and identification of high-interest events due to metadata do not require the collection and storage of Personally Identifiable Information (PII) beyond storage of video data footage for later retrieval.
- As used herein, the terms “feature vector” or more simply “feature” refer to vectors of metadata that correspond to a distinguishing structure in an object that is identified in video data of the object. Each element of the metadata is also referred to as a “feature descriptor” and a feature vector includes a plurality of feature descriptors. For example, the approximate shape of a human body or portions of the human body such as arms and legs is identified in video data. The human body is distinct from the surrounding environment, and a feature vector includes data that describe aspects of the human body in the video data including, for example, the size, location, and orientation of the object in the scene. If the video data include multiple humans, then each human can be described using a single feature vector, or each human can be described using multiple feature vectors for different body parts such as the arms, legs, torso, etc.
- As used herein, the term “dictionary” refers to a plurality of basis vectors that are generated using the sparse encoding process. After the dictionary is generated during the training process, the basis vectors in the dictionary are used to identify a degree of similarity between an arbitrary input vector and the input vectors that were used to generate the basis vectors in the dictionary during the training process. An optimization technique is used to select combinations of basis vectors using a sparse weight vector to generate a reconstructed vector that estimates the arbitrary input vector. An identified error between the reconstructed estimate vector and the actual input vector provides a measure of similarity between the input vector and the dictionary.
- As used herein, the term “key-frame” refers to an image frame in a video sequence of a motion performed by a person or other object in a scene that is considered to be representative of the overall motion. A video sequence of a motion typically includes two or more key-frames, and a training process that is described in more detail below includes identification of a limited number of N key-frames in the video sequence. Each video sequence of a particular event includes the same number of N key-frames, but the time at which each key-frame occurs can vary depending upon the angle of the video sequence and between different video sequences that are used as training data. An event of interest that is recorded from one or more angles during a training process includes a series of frames of video data. For example, a video sequence that depicts a person standing up from a sitting position is an event Annotators identify key-frames in the video sequence of the person standing in the video streams from multiple cameras that are positioned around the person. An event processor or another suitable processing device then extracts features from the identified key-frames to identify a sequence of feature vectors corresponding to the event. A training set of multiple video sequences that depict the same event performed by one or more people or objects from different viewing angles form the basis for selecting key-frames in each of the video sequences. The features that are extracted from the key-frames selected in video sequences in the training data form the basis for the dictionary that is incorporated into a database for the identification of similar motions performed by other people or objects in different scenes that are monitored by a video surveillance system.
- As used herein, the term “synchronization frame” refers to a frame of video data that is generated in a camera and that contains features that are extracted by a feature extraction processor in the camera to form a full feature vector. A full feature vector includes all of the data corresponding to the identified features in the frame of video data. As an object, such as a human, moves through a scene, the video data in subsequent image frames captures the movement, and the feature extraction processor generates sparse feature vectors that include only changes in the identified feature relative to previous frames that include the feature, such as the synchronization frame. In some embodiments, video cameras generate synchronization frames at regular intervals (e.g. once every 60 frames of video data). Feature vector extraction techniques that are known to the art include, but are not limited to, dimensionality reduction techniques including principal component analysis, edge detection, and scale-invariant feature transformations. In some embodiments, an identified object in a scene is encoded with a Histogram of Oriented Gradients (HOG) appearance feature descriptor. As described above, the key-frames of video data occur at particular times during an event of interest and are not necessarily aligned with the generation of synchronization and intermediate frames during operation of a camera. Consequently, a key-frame of video data that is generated during an event of interest can be captured with a synchronization frame or intermediate frame in a camera.
-
FIG. 1 depicts avideo monitoring system 100 that is configured to record video data about objects in a scene and to display selected video for additional analysis by human operators. Thevideo monitoring system 100 includes aprocessing station 160 and a plurality ofcameras 108A-108N that are each positioned to record ascene 112 from different locations and angles. Theprocessing station 160 further includes a video, object feature, andevent processor 104, object andfeature database 106,network device 164, and avideo output device 168. In the embodiment ofFIG. 1 , thenetwork device 164 is a wired or wireless data networking adapter, and thevideo output device 168 includes one or more display screens, such as LCD panels or other suitable video display devices. - In the
video monitoring system 100, thefeature processor 104 in theprocessing station 160 includes one or more digital processors such as central processing units (CPUs), graphical processing units (GPUs), digital signal processors (DSPs), application specific integrated circuits (ASICs), and the like that are configured to execute stored program instructions to process both feature and event data that are received from the cameras as well as video data that are received from the cameras. Theprocessor 104 further includes one or more memory devices that store programmed instruction data for execution of one or more software programs with theprocessor 104. Theprocessor 104 is operatively connected to thedatabase 106,network device 164, andvideo output device 168. During operation, theprocessing station 160 receives feature vector data and optionally video data from thecameras 108A-108N with thenetwork device 164. Theprocessor 104 in theprocessing station 160 identifies objects of interest and events of interest through synthesis of the feature vector data from one or more of thecameras 108A-108N in conjunction with predetermined feature vectors and event data that are stored in that are stored in the trained object features andevent database 106. - The trained object features and
event database 106 stores the dictionary of the training data. The training data are generated during a training phase for thesystem 100, and the feature basis vectors in the dictionary for key-frames that correspond to different portions of an event are typically not generated from the same objects that move through thescene 112 and are often recorded by a different set of cameras in a location other than thescene 112. As described below, thesystem 100 removes the background of the scene and rescales identified objects to identify feature vectors for new objects in thescene 112 that are independent of the particular features of thescene 112 and are not overly dependent upon the characteristics of an individual person or object that was not part of the training process. Thus, in thesystem 100 theevent processor 104 uses the stored dictionary of feature vectors in thedatabase 106 to identify events based on the motion of objects that were not used during the training process in scenes that correspond to locations other than the location used during the training process. - The trained object features and
event database 106 stores data corresponding to a plurality of predetermined features that are associated with previously identified objects and sequences of feature movements that are associated with previously identified events. For example, thedatabase 106 stores feature vector data corresponding to the identified shapes of humans and other objects that are present in thescene 112 and are recorded by thevideo cameras 108A-108N. The feature data can include the same feature as viewed from different angles and positions around the scene in the angles corresponding to the viewing angles and positions of thevideo cameras 108A-108N. The even data include predetermined sequences of movements for one or more identified features of one or more objects in the scene. For example, the event data in thedatabase 106 can include a sequence of features that correspond to a person who is walking. Another person who walks through thescene 112 exhibits similar features. The features change as the legs and other body parts of the person move while walking. Thedatabase 106 is implemented using one or more non-volatile and volatile digital data storage devices including, but not limited to, magnetic hard drives, optical drives, solid state storage devices, static and dynamic random access memory (RAM) devices, and any other suitable digital data storage device. - In the
video monitoring system 100, thecameras 108A-108N record video image data of thescene 112, identify feature data corresponding to objects in the recorded video, and transmit a portion of the feature data and video data to theevent processor 104. Usingcamera 108A as an example, each of the cameras includes asensor 140, afeature extraction processor 144,memory 148, and anetwork device 152. Thesensor 140 includes one or more sensing elements such as a charge-coupled devices (CCDs) or complementary metal oxide semiconductor (CMOS) image sensors that record video of thescene 112, and thesensor 140 is configured to generate digital image data from thescene 112 in, for example, monochrome, color, or near-infrared. In another embodiment the camera includes an infrared sensor for detecting images in the far infrared frequency band. In some embodiments thesensor 140 is further integrated with lenses, mirrors, and other camera optical devices that are known to the art. Thefeature extraction processor 144 includes one or more digital processors such as central processing units (CPUs), graphical processing units (GPUs), digital signal processors (DSPs), application specific integrated circuits (ASICs), and the like that are configured to execute stored program instructions to process image data from thesensor 140 and to identify feature vectors for one or more objects in thescene 112 using one or more feature extraction techniques. The memory 120 stores program instructions for thefeature extraction processor 144 and optionally stores a buffer of video data that thesensor 140 generates during operation of the camera in thememory 148. As described below, theprocessing station 160 optionally generates a request for buffered video data in response to identifying that one of thecameras 108A-108N has recorded an event. In one embodiment, thenetwork devices 152 in thecameras 108A-108N transmit data to thecorresponding network device 164 in theprocessing station 160 through a wireless data network such as, for example, a wireless local area network (WLAN) or wireless wide area network (WWAN). - In many data networks, including wireless networks, the bandwidth required to transmit all recorded video data and other data including extracted feature data from each camera to the
event processor 104 in theprocessing station 160 requires large amounts of network bandwidth. Thecameras 108A-108N optionally include visible, near-infrared or far-infrared illumination sources and the cameras include image intensifiers for low-light operation in some embodiments. - Each one of the
cameras 108A-108N includes thefeature extraction processor 144 to perform image processing and feature extraction processing. As described in more detail below, thecameras 108A-108N transmit full feature vector data for objects in the video in synchronization frames that are transmitted at regular intervals. The feature data include data vectors that describe one or more features for objects in video data that are generated in each frame. As described above, the synchronization frame is a frame of video data where a processor in the camera generates full feature data for each feature identified in the frame of video data. Synchronization frames are generated at regular intervals during operation of the camera, and frames of video data that are generated between synchronization frames are referred to as intermediate frames. During each intermediate frame of video data, the camera only transmits updates to features using a sparse feature encoding scheme to greatly reduce the amount of data and bandwidth requirements for transmitting updates to the feature vectors to theevent processor 104. - The
event processor 104 in theprocessing station 160 optionally requests full video data from one or more of thecameras 108A-108N during operation. For example, in response to identification of an even, theprocessor 104 requests video data from one or more of thecameras 108A-108N and thevideo output device 168 displays the video for an operator to review. The operator optionally generates additional requests for video from one or more of theother cameras 108A-108N. Thus, in one mode of operation a subset of thecameras 108A-108N transmit full video data to theprocessor 104, while other cameras only transmit the feature data and feature update data. As described above, the memory 120 in each of thecameras 108A-108N include an internal data storage device that is configured to buffer video data for a predetermined time period to enable theprocessor 104 to request additional video data that are stored in the camera. For example, the memory 120 in thecamera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for thescene 112. Thecamera 108B generates and transmits feature vector data for objects that are present in thescene 112, including moving objects, and transmits the feature vector data to theprocessor 104. If an event of interest occurs in thescene 112, the operator of theprocessor 104 requests the full video data corresponding to an identified time during which the event occurs and thecamera 108B retrieves the requested video from the data storage device. Thus, even though thecamera 108B does not transmit full video data to theprocessor 104, theprocessor 104 optionally retrieves video data for selected events of interest in thesystem 100. - In the
system 100, thedatabase 106 includes the trained models that are used to identify occurrences of events of interest from the feature vector metadata that thecameras 108A-108N transmit to thecentral processing station 160. Training is performed before thesystem 100 is used to perform surveillance on a scene, and the training process is often performed under controlled conditions at a different location than the location of thescene 112. In one embodiment, thecentral processing station 160 andevent processor 104 are configured to perform the training process, while in another embodiment a separate computing system performs the training process and data from the training process are stored in the trained object features andevent database 106 for use during operation of thesystem 100. - The training process includes a series of trials where a humans or other object perform motions that correspond to events of interest, and the motions are recorded as video from multiple viewing angles. A manual annotation process includes one or more annotators who select a limited number of key-frames from each of the video sequences to assist in generating a trained model for the human or object movements that occur in each event of interest. In one embodiment, the process of manual selection for key-frames during training includes an easy to use interface. This process is simplified to be done by a mechanical turk worker. The instructions presented to turk-workers for annotating the data to obtain key-frames. While the training process for selecting key-frames is performed manually in one embodiment, the feature extraction process and additional generation of the training dictionary data are performed in an automated manner without human intervention.
- For example, in one embodiment a digital processing device receives key-frames of video data from multiple video sequences of a particular event of interest in the training data. In one configuration, the multiple video sequences include videos taken from different positions and angles of a single person or object performing a single motion in an event of interest. The multiple video sequences also include recordings of multiple people or objects that perform the motion in an event of interest during multiple trials to improve the breadth and accuracy of the training data. Each trial is performed by the subject while he or she faces a different direction and at different locations in the field of view of the cameras. In one training process for the
system 100, the trials are performed using eight different orientations as 0, -
- radians with respect to the camera.
- The training process generates a model including appearance feature descriptor parameter templates and deformation parameters for one or more events c using a set of M video sequences that are each generated to depict an occurrence of the event c. For example, an event c includes a motion of a human kicking his or her leg, and the training data include M video sequences of the leg kick from one or more human training subjects that are recorded from multiple viewing angles performing the kick. The training set for a given event c is referred to as {Dq}(q=1, 2, . . . M). The training process uses a scoring function S(pq|Dq, wc)=wc, Φ(Dq, pq), where wc is a vector that includes all the appearance and deformation parameters that the training process refines as part of training the model, and Φ(Dq, pq) is the corresponding appearance and deformation energy that corresponds to a particular label pq.
- In some surveillance system embodiments, the video monitoring process needs to not only identify a single event of interest, but identify multiple events of interest and distinguish between the different events of interest. In one embodiment, the training process uses a one-vs-all learning policy for each event of interest, and generates the model parameters that jointly detect and recognize any particular event of interest given hard negative examples of other events of interest that are generated during the training process. In one embodiment, the training process uses a support vector machine (SVM) framework that employs the following objective learning function:
-
- In the SVM framework equations above, λ1 and λ2 are user defined scaling parameters that minimize slack values during optimization of the model. The constraint directed to key-frame labeling {circumflex over (p)} refers to a cost penalization function, or “loss” function Δ(pq, {circumflex over (p)}) where a key-frame label {circumflex over (p)} is penalized based on the observed (“ground truth”) key-frame pq that is generated during the training process. The non-negative slack term ξq provides additional robustness against violations of the constraint. The constraint directed to the ground-truth label pq implies that given any ground truth labeling pq for the qth sample of a particular motion, any ground truth labeling pq, of the q'th sample of any other event of interest in the training data produces a lower score after filtering through another violation accommodating hinge-loss term rq,q′.
- The loss function Δ(pq,{circumflex over (p)}) is used during the training process to reflect how well a particular hypothesized label {circumflex over (p)} matches the predetermined ground-truth label pq. In one embodiment, the loss function is a binary loss function where Δ(pq,{circumflex over (p)})=1 if {circumflex over (p)} matches pq and Δ(pq,{circumflex over (p)})=0 otherwise.
- The training process described above generates a model with appearance parameters and deformation parameters that can be used to classify multiple events of interest that are observed at a later time during operation of the
surveillance system 100. However, the training process is susceptible to assigning higher weights for some of the motions in the events of interest, which may result in misclassification for some events of interest. The bias is estimated using the median of score data generated from the trained model using the predetermined training data as an input as set forth in the following equation: bc=median{S(p1|D1,wc), . . . , S(pM|DM,wc)} In one embodiment, the training process estimates a bias V that is associated with different events of interest c. In thesystem 100, the bias data are stored in thedatabase 106 and are used to normalize scores during an event identification process to reduce the likelihood of misclassifying an event of interest. -
FIG. 2 depicts aprocess 200 for operation of thesurveillance system 100 for generation of feature vector data in recorded video and transmission of the feature vector data to a central processing system for identification of objects and events of interest. Theprocess 200 takes place after the training process has generated the model parameters for thedatabase 106 corresponding to a predetermined number of events of interest. In the description below, a reference to theprocess 200 performing an action or function refers to the operation of a processor, including processors in either or both of a camera and a central processing system, to execute programmed instructions to perform the action or function in conjunction with other components in the video monitoring system. Theprocess 200 is described in conjunction with thevideo monitoring system 100 ofFIG. 1 for illustrative purposes. - During
process 200, one or more of thevideo cameras 108A-108N generate recorded video of thescene 112 and thefeature extraction processors 144 in each camera perform background subtraction from the video image data (block 204). InFIG. 2 , thecameras scene 112 from a different position. Thecameras feature extraction processor 144 in each camera subtracts portions of the image data that correspond to the static portions of thescene 112 that do not move or change during generation of the video, such as the wall and the floor in thescene 112. Theimages scene 112 with a black background that represents subtraction of the background image data in the video. The background subtraction maintains a dynamic evolving background image of the static scene. This background image evolves slowly to account for small variations in the lighting of the scene during the course of a day, or objects could be placed or removed from the scene. The dynamic background image serves as a reference image that is compared against every new image captured by the camera sensor. Thefeature extraction processor 144 in each of thecameras -
Process 200 continues asfeature extraction processors 144 in each of thecameras feature extraction processor 144 generates a grid at a fixed resolution image to form each block in the grid containing the same number of pixels. Thefeature extraction processor 144 identifies image gradients within each grid-block and certain feature vectors are identified in a histogram of the image gradients in each grid-block. Once the individual feature vectors are identified for each block in the grid, the feature vectors are appended to each other to form one large feature vector using, for example, fixed-size array of 5×5 grids with HOG descriptors. Thus, one fixed size feature vector is identified for each foreground object in the image. - As mentioned above, the bounding box containing the foreground image is resized to generate a fixed resolution image. For example, two people of different height and size or at two different distances from the camera can be compared using the feature vectors generated from video of the
scene 112. Thus, the process of extracting feature vectors on the fixed resolution foreground image provides illumination invariance, scale invariance and some rotational invariance. -
Process 200 continues as each camera compresses and transmits the feature data descriptor vectors to the event processor 104 (block 212). Since the poses of people in the scene vary gradually over time, there is a high degree of correlation between their corresponding feature vectors over successive frames. Theimages cameras processing station 160. Thefeature extraction processors 144 in thecameras feature extraction processors 144 use a sparse-coding framework to compress the feature vector updates. Thefeature extraction processors 144 periodically regenerate full feature vectors during synchronization frames of the video data to account for new objects in thescene 112 and to prevent the buildup of excessive noise errors from the sparse feature vector generation process. Advantages of performing the sparse encoding and compression include reductions to the amount of data transmitted to theevent processor 104, and the correlation method tracks each individual person or moving object in the foreground, thereby enabling prediction of the path of movement for the object. Each of thecameras network device 164 in theprocessing station 160 using thenetwork devices 152 in each camera. - In one embodiment of the cameras that are used with the
system 100, each of thecameras cameras 108A-108B transmit only metadata to thecentral processing station 160 unless thecentral processing station 160 generates a request for full video data in response to identifying an event of interest that is viewed by one or both of thecameras -
Process 200 continues as theevent processor 104 in theprocessing station 160 receives the compressed feature vector data from thecameras processing station 160 reconstructs the full feature vector from multiple sparse feature vectors that are generated by two or more of the cameras for an object in thescene 112. The joint decompression scheme minimizes the error in decompression, when compared to independent decoding of the separate data from each of the cameras. -
Process 200 continues with identification of a person or object in the decompressed feature vector data from the cameras (block 220). In one operating mode, themonitoring system 100 is configured to identify feature vectors that correspond to humans and monitor motions of the humans. Other embodiments are configured to identify the motion of other objects, including motor vehicles or animals other than humans in different configurations. Some foreground feature vectors might correspond to people while others could correspond to other objects (such as cars, animals, bicycles, etc.). The feature andevent database 106 stores sets of feature vectors that correspond to humans and are generated during a training process for thevideo monitoring system 100. Theevent processor 104 filters the feature vectors corresponding to humans in thescene 112 from the other objects using the predetermined training data in thedatabase 106. In one embodiment, the process of filtering objects to identify humans is performed using an object classifier. - In some embodiments of the
process 200, theevent processor 104 is configured to identify particular events that occur when an identified object, such as a human, performs a motion from the metadata received from thecameras 108A-108N. In the illustrative embodiment ofFIG. 2 , the video data include a plurality of frames in which aperson 240 in thescene 112 performs a kick. Theprocessor 104 performs event recognition of the kick event using the feature vectors that are received from bothcameras database 106. If an event of interest is identified, theevent processor 104 generates an alert to prompt a human operator to review the event. Theprocessor 104 receives video of the event from one or more of the cameras 108 to provide video playback of the event to the human operator. Theprocessor 104 optionally generates a classification of the event for the human operator to review in addition to viewing the video data. For example, in the example ofFIG. 2 , theevent processor 104 generates an alert indicating that a person has performed a kick (block 224), and optionally generates a request for video of the event for display through thevideo output device 168 for review by human operators. - As described above, the
system 100 includes the trained object features andevent database 106 that stores feature vector data that are identified for a plurality of events of interest during a training process. Using the kick event ofFIG. 2 as an example, thedatabase 106 stores feature vectors that are generated from video recordings of one or more people performing kicks during a training process. The recordings of the kicks are referred to as “exemplars,” and the exemplars are typically termed “training data” in the computer vision and machine learning communities. Sufficient training data is recorded for each event of category of interest. For example, in one embodiment a training process collects data of multiple subjects performing each of the events of interest over multiple trials. - During the training process, some image frames of the event are selected as key-frames. For each motion, a predetermined number of key-frames, such as six key-frames, are selected manually from the video data of each trial. The key-frames represent the pose/gesture frames that provide maximum information regarding the motion being performed. A feature vector is extracted for the person in each key-frame using the sub-systems using the same methods that are described above for feature vector extraction in the
process 200. The key-frame feature vectors form the training database. In thesystem 100, the dictionary of events that is generated from the training data is stored with the trained object features andevent database 106. - During the video monitoring process, each feature vector corresponding to a motion of a single person is compared to the feature vectors that are stored in the
event database 106. Two feature vectors, however, might be very similar for single frames of two different motions. For instance, a single frame of a person walking might be indistinguishable from a single frame of a person running. Thus, the feature vectors of query image sequences are compared with the key-frame sequences for each motion in order to remove ambiguity regarding the motion that is recorded by the monitoring system. Further, the information from multiple cameras needs to be fused to remove ambiguity from the feature vector that are generated by multiple cameras in different locations because some motions can be invisible to some camera views due to occlusions. For instance, one arm of a person who is oriented perpendicular to the camera is invisible to this camera, and thus another camera facing the same person would capture his occluded arm. - In the
system 100, theevent processor 104 in theprocessing station 160 uses a graphical model for each event of interest to identify the events over both time and from multiple viewing angles. The graphical model formulation is a probabilistic model that captures the interaction between multiple key-frames, across multiple camera views. In one embodiment, the model includes M key-frames and N camera views, for a total of N×M nodes in the graph. Different configurations of the graph include multiple arrangements of connections between nodes. Each choice of connections has different properties for the identification of events. The edges of the graphs encode the time difference between the key-frames for that motion in the temporal edges, and the spatial distance of the foreground object bounding box along the spatial edges. -
FIG. 3 depicts one arrangement of nodes in agraph 300 for identifying events using a majority voting model. In the majority voting model, only the nodes corresponding to each of the cameras are connected together linearly over time. Thus, thegraph 300 ofFIG. 3 includes one chain for each camera that observes the event. The inference of the probability of occurrence of any event is performed independently for each chain using a standard method like dynamic programming. Each camera is assigned the same weight in determining if a particular event has been observed in the configuration ofFIG. 3 . In thesystem 100, theevent processor 104 receives the feature vector data from thecameras 108A-108N and identifies if the feature vector graphs from each camera form a majority indicating that an event has been observed. If the feature vectors graphs from the majority of cameras do not agree that an event has occurred, theevent processor 104 identifies that no event has occurred (non-occurrence of the event) and does not request full video from the cameras. - In
FIG. 3 , the nodes for each camera correspond to key-frames that are identified by the cameras are formed as a separate sub-graph. A video sequence of frames is represented as D and any particular event of interest is represented as directed graph G, where each node is a key-frame for the event of interest. The graph G is represented by a collection of nodes V connected by edges E. The number of the V nodes is expressed as M. Within the graph, any given node iε{1 . . . M} has an anchor position pi=(xi,yi,ti) where (xi,yi) represents the pixel location in the center of the bounding box that is generated around the object in the image and ti represents the frame number in the video sequence, which acts as a time reference for when the frame was generated. The edges E in the graph specify which pairs of key-frame nodes are constrained to have relations. For example, inFIG. 3 the nodes for each camera are connected to each other by edges in a chain corresponding to a time-based sequence of key-frames that correspond to a particular event of interest, with the key-frames occurring in predetermined sequence. In other embodiments, the framework is more general and edges in the graph need not be successive. For example, some events can include a variable number of repetitions for a particular motion. The graph optionally includes jump edges that form cycles between key-frames for the motion that is repeated one or more times. - In some embodiments, the
event processor 104 identifies key-frames and changes of the feature descriptors for an object between key-frames using a deformable key-frame model. InFIG. 3 , theevent processor 104 generates a score that corresponds to the likelihood that each graph each graph generates a score - S(p|D,w)=ΣiεV(wi,φapp(D,pi))+Σi,jεE(wij,φdef(pi;pj)) where φapp(D,pi) is an HOG or other feature descriptor for an object that is detected at a frame time ti, and φdef(pi,pj) models deformation of the object between pairs of frames (frames i and j) based on the changes in the feature descriptor metadata that are received from one or more of the cameras. For a series of image frames that are generated by a single camera, the deformation is expressed as: φdef(pi,pj)=[dx; dx2; dy; dy2; dt; dt2] where dx=xi−xj, (change in x position) dy=yi−yj (change in y position), and dt=ti−tj (change in time). To match the feature vectors for a frame of video to a template w, in the dictionary of the
database 106, theevent processor 104 identifies a maximum inner product response with the feature vectors at the location pi of the in the video D. A deformation weight wu between two frames models the Mahalanobis distance between the pairs of key-frames over time in the model. The parameters for the Mahalanobis distance are generated during the training of the model and are stored in thedatabase 106. -
FIG. 4 depicts anothergraph configuration 400 where one camera acts as a root node that is connected across time to the key-frames that are generated by other cameras in thesystem 100. All other camera views are connected to the nodes from the selected camera as shown inFIG. 4 . The inference in this case is also done using dynamic programming. The configuration ofFIG. 4 is referred to as a multi-view chain graphical model. In the example ofFIG. 4 , thecamera 2 is selected as the root node. InFIG. 4 thecamera 404 is selected as the reference camera. In one embodiment, thesystem 100 uses a single camera as the reference camera, such as a system in which a single camera has higher resolution than the other cameras or for a camera that is positioned in a certain location to identify events in the scene with fewer obstructions than other cameras in thesystem 100. In another embodiment, thesystem 100 assigns the reference camera dynamically based on the camera that detects a key-frame in an event prior to the other cameras that view the scene. Thus, the reference camera for one event of interest can be different than the reference camera for another event of interest in thesystem 100. Thecamera 404 views an event and generates key-frames for the event, such as key-frames 406A-406M. Thecameras frames 410A-410M and 414A-414M, respectively. - As depicted in
FIG. 4 , each of the cameras generates key-frames from an event, but the cameras do not necessarily generate the key-frames at the same time. For example, inFIG. 4 thereference camera 404 generates the key-frame 406B earlier than the corresponding key-frames cameras 410 and 412, respectively. Theevent processor 104 receives the feature data corresponding to each of these key-frames from the cameras and identifies that the key-frames frame 406B from thereference camera 404, even though the key-frames are not generated at exactly the same time. In one embodiment, theevent processor 104 applies a temporal constraint to the frames, which is to say that theevent processor 104 identifies that key-frames from different cameras correspond to different views of the same event when the key-frames occur within a comparatively short time period of one another. For example, in one embodiment theevent processor 104 applies a temporal-weighted scale to key-frames that are generated by the other cameras to identify the likelihood that the key-frames correspond to the same portion of the same event of interest as a key-frame from the reference camera. For example, if the key-frame 410B occurs within 100 milliseconds of the key-frame 406B, then the weighted scale assigns a high probability (e.g. 90%) that the two key-frames correspond to each other, while a longer delay of 1 second has a correspondingly lower probability (e.g. 25%) that the two key-frames correspond to one another. - In one embodiment, the
event processor 104 extends the score identification process that is described for single cameras inFIG. 3 to multiple cameras including the reference camera and one or more additional cameras that view an event. The graph ofFIG. 4 depicts the nodes as key-frames with directed edges that connect the nodes 404A-404N for series of key-frames from thereference camera 404. Thegraph 400 also includes edges that connect the key-frame nodes from thenon-reference cameras event processor 104 identifies the edge connections between key-frames from the different cameras based on the proximity in time between the detection of each of the key-frames from the different cameras that record the same event. - In the multi-camera configuration of
FIG. 4 theevent processor 104 generates scores S(p|D,w) in a similar manner to the single-camera configurations that are depicted inFIG. 3 . The deformation function φdef, however, changes when key-frame data from multiple cameras are used to generate the score. To model deformation between key-frames from cameras that record an object in a scene from multiple angles and locations, theevent processor 104 uses a fixed or “homographic” feature in the scene that is visible to each of the cameras. For example, in many surveillance system configurations all of the cameras have a view of a single ground plane in the scene. The ground plane does not necessarily need to be the physical ground on which people walk, but is instead a common geometric plane that serves as a reference point for identifying an object and event when the object is viewed from different angles and locations. Theevent processor 104 identities the homography Hl r for a common ground plane between the reference camera r and any of the other cameras l that view the ground plane. The homography is a linear transformation that maps pixels, and correspondingly features, in one view of a plane to another, and theevent processor 104 uses the homography to identify the distance between objects in the views of different cameras. Additionally, when theevent processor 104 identifies the motions of humans who typically are in contact with the ground plane, the center of the line connecting the bottom corners of the bounding boxes that are formed around the objects in each scene act as a proxy for the 3D location of the objects in the scene. Thus, theevent processor 104 can identify the locations of the same object viewed from different cameras in situations where the ground plane or other homographic elements is within the view of the different cameras and the object remains proximate to the homographic element. In thesurveillance system 100, the multiple cameras maintain a view of the common ground plane and objects on the ground plane for theevent processor 104 to identify the positions of objects in the views of the different cameras using the homographic transformation. - Given the homogeneous coordinates of a pixel, fl=(xl,yl,1)T on the ground plane in the view of camera l, the position of the pixel to the reference camera r is estimated as {circumflex over (f)}r=Hl rfl. The deformation function for the two views is defined as: φdef(fi l,fi r)=[dx; dx2; dy; dy2] where [dx, dy]=(fr−Hl rfl)T. In one embodiment, the deformation function is modeled as a spring function where the cost to perform a deformation corresponds to an amount of force required to stretch a spring.
FIG. 6 depicts a graphical representation of the deformation constraints betweendifferent views common ground plane 608 that is visible to three different cameras. Theevent processor 104 identifies correspondinglocations 612A-612C at the bottom center of boundingboxes 606A-606C in the feature descriptor data corresponding to each of theviews 602A-602C, respectively. Thebounding box locations 612A-612C are each in contact with theground plane 608 for the homographic transformation. -
FIG. 5 depicts anothergraph 500 where the key-frames that are generated from each of the cameras are connected together. In the framework ofFIG. 5 , every node is connected to its neighbor across space and time as shown inFIG. 5 . Inference for this model can be done using iterated dynamic programming, with multiple spatial iterations interspersed between temporal iterations. In one configuration, theevent processor 104 processes the graphFIG. 5 as a set of graphs similar toFIG. 4 where each camera is treated as the reference camera during one iteration of the processing of thegraph 500. Theevent processor 104 subsequently combines the scores for the different iterations through thegraph 500 to identify the most likely event that corresponds to the key-frames from the different cameras. The number of iterations is fixed a priori in one embodiment. The configuration ofFIG. 5 is referred to as a multi-view field where the nodes corresponding to all cameras are connected together. - During the
process 200 thecentral processing station 160 uses the graphical models described above to process detected key-frames in the feature vector metadata from thecameras event processor 104 and the predetermined models in thedatabase 106 to identify particular events of interest, such as the kicking event depicted inFIG. 2 , and to take an action in response to identifying the event of interest. Theevent processor 104 performs an inference operation to identify a “best” event c* and corresponding label p* from the predetermined set of events of interest in thedatabase 106. The “best” event c* refers to the event that has the highest likelihood of occurrence and corresponding non-occurrence of other events given the observed sequences of key-frame metadata from the cameras. The label p* refers to a human or machine readable identifier that is associated with the event c* that is determined during the training process and is stored in thedatabase 106. Theevent processor 104 identifies c* and p* by maximizing the score function as set forth in the following equation: {c*, p}=argmaxp,cε{1 . . . C}(S(p|D,wc)−bc), where wc is the template stored in thedatabase 106 for one of the events c, and bc is the bias factor generated during the training process that is subtracted from the raw score to account for bias during the model training process. - It will be appreciated that variants of the above-described and other features and functions, or alternatives thereof, may be desirably combined into many other different systems, applications or methods. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements may be subsequently made by those skilled in the art that are also intended to be encompassed by the following claims.
Claims (18)
1. A surveillance system comprising:
a plurality of cameras located in a plurality of positions to record a scene, each camera further comprising:
a sensor configured to generate video data of the scene comprising a series of frames;
a first network device configured to transmit the video data and feature vectors associated with the video data to a processing station; and
a feature extraction processor operatively connected to the sensor and the network device, the feature extraction processor being configured to:
identify a plurality of feature vectors in video data generated by the sensor;
transmit only the plurality of feature vectors to the processing station with the first network device in a first operating mode; and
transmit the video data to the processing station with the first network device in a second operating mode only in response to a request for the video data from the processing station; and
the processing station further comprising:
a second network device;
a video output device; and
a processor operatively connected to the second network device and the video output device, the processor being configured to:
receive the plurality of feature vectors generated by each camera in the plurality of cameras with the second network device;
identify an object and motion of the object in the scene with reference to the plurality of feature vectors received from at least two of the plurality of cameras;
identify an event corresponding to the motion of the object in the scene with reference to a predetermined database of events;
generate a request for transmission of the video data from at least one camera in the plurality of cameras; and
generate a graphical display of the video data from the at least one camera with the video output device to display the object associated with the event.
2. The surveillance system of claim 1 , feature extraction processor in each of the plurality of cameras being further configured to:
identify a portion of one frame in the video data that corresponds to a background of the scene; and
identify the plurality of features in the video data only in portions of the one frame that do not correspond to the background of the scene.
3. The surveillance system of claim 2 , the feature extraction processor in each camera being further configured to:
generate a bounding box corresponding a portion of the one frame that does not correspond to the background of the scene;
resize the portion of the one frame in the bounding box to a generate a fixed resolution image of the portion of the one frame included in the bounding box; and
identify at least one feature vector in the plurality of feature vectors with reference to the fixed resolution image.
4. The surveillance system of claim 3 , the feature extraction processor being further configured to:
identify a plurality of image gradients in the fixed resolution image; and
generate the feature vector including a histogram of gradients corresponding to the plurality of image gradients.
5. The surveillance system of claim 1 , the feature extraction processor in a first camera in the plurality of cameras being further configured to:
identify a first feature vector for a first frame in the video data;
transmit the first feature vector to the processing station with the first network device;
identify a second feature vector for a second frame in the video data, the second frame occurring after the first frame; and
transmit a sparse feature vector corresponding to only a portion of the second feature vector that is different than the first feature vector to the processing station with the first network device.
6. The surveillance system of claim 4 , the processor in the processing station being further configured to:
receive the first feature vector from the first camera;
receive a sparse feature vector from a second camera in the plurality of cameras, the sparse feature vector including only portions of a feature vector generated in the second camera that have changed since generation of an earlier feature vector, the earlier feature vector being generated approximately concurrently to the first feature vector from the first camera; and
generate a full feature vector corresponding to the sparse feature vector from the second camera with reference to the first feature vector from the first camera.
7. The surveillance system of claim 1 , the processor in the processing station being further configured to:
identify occurrence or non-occurrence of an event in each plurality of feature vectors from each of the plurality of cameras individually with reference to the plurality of feature vectors corresponding to a plurality of key-frames from each of the plurality of cameras corresponding to the motion of the object during a single time period;
identify the event in response to an identification of an occurrence of the event from a majority of the plurality of cameras.
8. The surveillance system of claim 1 , the processor in the processing station being further configured to:
receive a first plurality of feature vectors from a first camera corresponding to a first frame of video data;
identify that the first plurality of feature vectors from the first camera correspond to a first key-frame for an event in the predetermined database of events;
receive a second plurality of feature vectors from a second camera corresponding to a second frame of video data; and
identify that the second plurality of feature vectors from the second camera correspond to a second key-frame from the event; and
generate a score corresponding to a deformation between the first plurality of feature vectors in the first frame and the second plurality of feature vectors in the second frame; and
identify occurrence or non-occurrence of the event with reference to the score.
9. The surveillance system of claim 8 , the processor in the processing station being further configured to:
identify a first location of an object corresponding to the first plurality of feature vectors that are extracted from the first frame of video data;
identify a second location of the object corresponding to the second plurality of feature vectors that are extracted from the second frame of video data;
perform a homographic transformation to identify a distance between the first location of the object and the second location of the object with reference to a ground plane that is present in both the first frame of video data and the second frame of video data; and
generate the score corresponding to the deformation between the first plurality of feature vectors in the first frame and the second plurality of feature vectors in the second frame with reference to the distance between the first location and the second location.
10. A method for surveillance of a scene comprising:
generating with a sensor in a first camera first video data of the scene, the first video data comprising a first series of frames;
identifying with a feature extraction processor in the first camera a first plurality of feature vectors in the first video data;
transmitting with a network device in the first camera only the first plurality of feature vectors to a processing station in a first operating mode;
transmitting with the network device in the first camera the first video data to the processing station in a second operating mode only in response to a request for the first video data from the processing station;
generating with another sensor in a second camera second video data of the scene, the second video data comprising a second series of frames and the second camera generating the second video data of the scene from a different position than the first camera;
identifying with another feature extraction processor in the second camera a second plurality of feature vectors in the second video data;
transmitting with another network device in the second camera only the second plurality of feature vectors to the processing station in the first operating mode;
transmitting with the other network device in the second camera the second video data to the processing station in the second operating mode only in response to a request for the second video data from the processing station;
receiving with another network device in the processing station the first plurality of feature vectors from the first camera and the second plurality of feature vectors from the second camera;
identifying with an event processor in the processing station an object and motion of the object in the scene with reference to the first and second plurality of feature vectors;
identifying with the event processor in the processing station an event corresponding to the motion of the object in the scene with reference to a predetermined database of events;
generating with the event processor in the processing station a request for transmission of the video data from at least one of the first camera and the second camera; and
generating with a video display device a graphical display of video data received from at least one of the first camera and the second camera with the video output device to display the object associated with the event.
11. The method of claim 10 further comprising:
identifying with the feature extraction processor in the first camera a portion of one frame in the first video data that corresponds to a background of the scene; and
identifying with the feature extraction processor in the first camera the first plurality of features in the first video data only in portions of the one frame that do not correspond to the background of the scene.
12. The method of claim 11 further comprising:
generating with the feature extraction processor in the first camera a bounding box corresponding a portion of the one frame that does not correspond to the background of the scene;
resizing with the feature extraction processor in the first camera the portion of the one frame in the bounding box to a generate a fixed resolution image of the portion of the frame included in the bounding box; and
identifying with the feature extraction processor in the first camera at least one feature vector in the first plurality of feature vectors with reference to the fixed resolution image.
13. The method of claim 12 , the identification of the at least one feature vector further comprising:
identifying with the feature extraction processor in the first camera a plurality of image gradients in the fixed resolution image; and
generating with the feature extraction processor in the first camera the at least one feature vector including a histogram of gradients corresponding to the plurality of image gradients.
14. The method of claim 10 further comprising:
identifying with the feature extraction processor in the first camera a first feature vector for a first frame in the video data;
transmitting with the network device in the first camera the first feature vector to the processing station;
identifying with the feature extraction processor in the first camera a second feature vector for a second frame in the video data, the second frame occurring after the first frame; and
transmitting with the network device in the first camera a sparse feature vector corresponding to only a portion of the second feature vector that is different than the first feature vector to the processing station.
15. The method of claim 14 further comprising:
receiving with the event processor in the processing station the first feature vector from the first camera;
receiving with the event processor in the processing station another sparse feature vector from the second camera, the other sparse feature vector including only portions of a feature vector generated in the second camera that have changed since generation of an earlier feature vector in the second camera, the earlier feature vector being generated approximately concurrently to the first feature vector from the first camera; and
generating with the event processor in the processing station a full feature vector corresponding to the other sparse feature vector from the second camera with reference to the first feature vector from the first camera.
16. The method of claim 10 , the identification of the event further comprising:
identifying with the event processor in the processing station occurrence or non-occurrence of an event in the first plurality of feature vectors and the second plurality of feature vectors individually with reference to the first plurality of feature vectors corresponding to a plurality of key-frames and the second plurality of feature vectors corresponding to the plurality of key-frames corresponding to the motion of the object during a single time period;
identify the event in response to an identification of an occurrence of the event from one or both of the first camera and second camera.
17. The method of claim 10 , the identification of the event further comprising:
receiving with the event processor in the processing station the first plurality of feature vectors from the first camera corresponding to a first frame of the first video data;
identifying with the event processor that the first plurality of feature vectors correspond to a first key-frame for an event in the predetermined database of events;
receiving with the event processor the second plurality of feature vectors from the second camera corresponding to a second frame of the second video data; and
identifying with the event processor that the second plurality of feature vectors correspond to a second key-frame for the event; and
generating with the event processor in the processing station a score corresponding to a deformation between the first plurality of feature vectors in the first frame and the second plurality of feature vectors in the second frame; and
identifying with the event processor in the processing station occurrence or non-occurrence of the event with reference to the score.
18. The method of claim 17 , the generation of the score further comprising:
identifying with the event processor in the processing station a first location of an object corresponding to the first plurality of feature vectors that are extracted from the first frame of video data;
identifying with the event processor in the processing station a second location of the object corresponding to the second plurality of feature vectors that are extracted from the second frame of video data;
performing with the event processor in the processing station a homographic transformation to identify a distance between the first location of the object and the second location of the object with reference to a ground plane that is present in both the first frame of video data and the second frame of video data; and
generating with the event processor in the processing station the score corresponding to the deformation between the first plurality of feature vectors in the first frame and the second plurality of feature vectors in the second frame with reference to the distance between the first location and the second location.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/273,653 US9665777B2 (en) | 2013-05-10 | 2014-05-09 | System and method for object and event identification using multiple cameras |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361822051P | 2013-05-10 | 2013-05-10 | |
US14/273,653 US9665777B2 (en) | 2013-05-10 | 2014-05-09 | System and method for object and event identification using multiple cameras |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140333775A1 true US20140333775A1 (en) | 2014-11-13 |
US9665777B2 US9665777B2 (en) | 2017-05-30 |
Family
ID=51864512
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/273,653 Active 2035-06-13 US9665777B2 (en) | 2013-05-10 | 2014-05-09 | System and method for object and event identification using multiple cameras |
Country Status (4)
Country | Link |
---|---|
US (1) | US9665777B2 (en) |
EP (1) | EP2995079A4 (en) |
CN (1) | CN105531995B (en) |
WO (1) | WO2014183004A1 (en) |
Cited By (109)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140270358A1 (en) * | 2013-03-15 | 2014-09-18 | Pelco, Inc. | Online Learning Method for People Detection and Counting for Retail Stores |
US20150227778A1 (en) * | 2014-02-07 | 2015-08-13 | International Business Machines Corporation | Intelligent glasses for the visually impaired |
US9158974B1 (en) | 2014-07-07 | 2015-10-13 | Google Inc. | Method and system for motion vector-based video monitoring and event categorization |
US9170707B1 (en) | 2014-09-30 | 2015-10-27 | Google Inc. | Method and system for generating a smart time-lapse video clip |
US20160019427A1 (en) * | 2013-03-11 | 2016-01-21 | Michael Scott Martin | Video surveillence system for detecting firearms |
US20160125267A1 (en) * | 2013-06-04 | 2016-05-05 | Elbit Systems Land And C4I Ltd. | Method and system for coordinating between image sensors |
US9363353B1 (en) * | 2014-12-04 | 2016-06-07 | Hon Man Ashley Chik | Mobile phone docks with multiple circulating phone connectors |
US9367733B2 (en) | 2012-11-21 | 2016-06-14 | Pelco, Inc. | Method and apparatus for detecting people by a surveillance system |
CN105939464A (en) * | 2016-06-15 | 2016-09-14 | 童迎伟 | Intelligent vehicle-mounted monitoring system and safety monitoring method thereof |
US9449229B1 (en) | 2014-07-07 | 2016-09-20 | Google Inc. | Systems and methods for categorizing motion event candidates |
US9454884B1 (en) * | 2015-09-28 | 2016-09-27 | International Business Machines Corporation | Discovering object pathways in a camera network |
US20160286171A1 (en) * | 2015-03-23 | 2016-09-29 | Fred Cheng | Motion data extraction and vectorization |
US9501915B1 (en) | 2014-07-07 | 2016-11-22 | Google Inc. | Systems and methods for analyzing a video stream |
US20170061214A1 (en) * | 2015-08-31 | 2017-03-02 | General Electric Company | Controlling bandwith utilization of video transmissions for quality and scalability |
US20170068921A1 (en) * | 2015-09-04 | 2017-03-09 | International Business Machines Corporation | Summarization of a recording for quality control |
USD782495S1 (en) | 2014-10-07 | 2017-03-28 | Google Inc. | Display screen or portion thereof with graphical user interface |
WO2017060894A1 (en) * | 2015-10-06 | 2017-04-13 | Agent Video Intelligence Ltd. | Method and system for classifying objects from a stream of images |
US20170150093A1 (en) * | 2015-11-20 | 2017-05-25 | Vivotek Inc. | Video file playback system capable of previewing image, method thereof, and computer program product |
US20170148183A1 (en) * | 2015-11-25 | 2017-05-25 | Behavioral Recognition Systems, Inc. | Image driver that samples high-resolution image data |
CN106982347A (en) * | 2016-01-16 | 2017-07-25 | 阔展科技(深圳)有限公司 | Has the intelligent mobile monitor of extraction and analysis data capability |
WO2017131723A1 (en) * | 2016-01-29 | 2017-08-03 | Hewlett Packard Enterprise Development Lp | Generating a test case for a recorded stream of events |
WO2017130187A1 (en) * | 2016-01-26 | 2017-08-03 | Coral Detection Systems Ltd. | Methods and systems for drowning detection |
US20170223261A1 (en) * | 2014-07-31 | 2017-08-03 | Hitachi Maxell, Ltd. | Image pickup device and method of tracking subject thereof |
US9779276B2 (en) | 2014-10-10 | 2017-10-03 | Hand Held Products, Inc. | Depth sensor based auto-focus system for an indicia scanner |
US9779546B2 (en) | 2012-05-04 | 2017-10-03 | Intermec Ip Corp. | Volume dimensioning systems and methods |
CN107409198A (en) * | 2015-03-04 | 2017-11-28 | 株式会社日立系统 | Situation based on camera image data confirms system and control device and the situation confirmation method based on camera image data |
US9835486B2 (en) | 2015-07-07 | 2017-12-05 | Hand Held Products, Inc. | Mobile dimensioner apparatus for use in commerce |
US9906704B2 (en) * | 2015-09-17 | 2018-02-27 | Qualcomm Incorporated | Managing crowd sourced photography in a wireless network |
US20180061064A1 (en) * | 2014-10-15 | 2018-03-01 | Comcast Cable Communications, Llc | Generation of event video frames for content |
US9940721B2 (en) | 2016-06-10 | 2018-04-10 | Hand Held Products, Inc. | Scene change detection in a dimensioner |
US9939259B2 (en) | 2012-10-04 | 2018-04-10 | Hand Held Products, Inc. | Measuring object dimensions using mobile computer |
CN108073857A (en) * | 2016-11-14 | 2018-05-25 | 北京三星通信技术研究有限公司 | The method and device of dynamic visual sensor DVS event handlings |
WO2018102919A1 (en) | 2016-12-05 | 2018-06-14 | Avigilon Corporation | System and method for appearance search |
US10009579B2 (en) | 2012-11-21 | 2018-06-26 | Pelco, Inc. | Method and system for counting people using depth sensor |
US10025314B2 (en) | 2016-01-27 | 2018-07-17 | Hand Held Products, Inc. | Vehicle positioning and object avoidance |
US10066982B2 (en) | 2015-06-16 | 2018-09-04 | Hand Held Products, Inc. | Calibrating a volume dimensioner |
US10083359B2 (en) * | 2016-04-28 | 2018-09-25 | Motorola Solutions, Inc. | Method and device for incident situation prediction |
WO2018175968A1 (en) * | 2017-03-24 | 2018-09-27 | Numenta, Inc. | Location processor for inferencing and learning based on sensorimotor input data |
US10094650B2 (en) | 2015-07-16 | 2018-10-09 | Hand Held Products, Inc. | Dimensioning and imaging items |
US10127783B2 (en) | 2014-07-07 | 2018-11-13 | Google Llc | Method and device for processing motion events |
US10134120B2 (en) | 2014-10-10 | 2018-11-20 | Hand Held Products, Inc. | Image-stitching for dimensioning |
US10140827B2 (en) | 2014-07-07 | 2018-11-27 | Google Llc | Method and system for processing motion event notifications |
US10163216B2 (en) | 2016-06-15 | 2018-12-25 | Hand Held Products, Inc. | Automatic mode switching in a volume dimensioner |
US10203402B2 (en) | 2013-06-07 | 2019-02-12 | Hand Held Products, Inc. | Method of error correction for 3D imaging device |
US10218964B2 (en) | 2014-10-21 | 2019-02-26 | Hand Held Products, Inc. | Dimensioning system with feedback |
US10225544B2 (en) | 2015-11-19 | 2019-03-05 | Hand Held Products, Inc. | High resolution dot pattern |
US10223591B1 (en) * | 2017-03-30 | 2019-03-05 | Amazon Technologies, Inc. | Multi-video annotation |
US20190080575A1 (en) * | 2016-04-07 | 2019-03-14 | Hanwha Techwin Co., Ltd. | Surveillance system and control method thereof |
US10240914B2 (en) | 2014-08-06 | 2019-03-26 | Hand Held Products, Inc. | Dimensioning system with guided alignment |
US10242035B1 (en) * | 2018-04-02 | 2019-03-26 | Pond5, Inc. | Method and system for image searching |
US10249030B2 (en) | 2015-10-30 | 2019-04-02 | Hand Held Products, Inc. | Image transformation for indicia reading |
US10247547B2 (en) | 2015-06-23 | 2019-04-02 | Hand Held Products, Inc. | Optical pattern projector |
US20190156496A1 (en) * | 2017-11-21 | 2019-05-23 | Reliance Core Consulting LLC | Methods, systems, apparatuses and devices for facilitating motion analysis in an environment |
US10339352B2 (en) | 2016-06-03 | 2019-07-02 | Hand Held Products, Inc. | Wearable metrological apparatus |
US10346461B1 (en) * | 2018-04-02 | 2019-07-09 | Pond5 Inc. | Method and system for image searching by color |
US20190213855A1 (en) * | 2015-09-02 | 2019-07-11 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
US10354169B1 (en) * | 2017-12-22 | 2019-07-16 | Motorola Solutions, Inc. | Method, device, and system for adaptive training of machine learning models via detected in-field contextual sensor events and associated located and retrieved digital audio and/or video imaging |
EP3311334A4 (en) * | 2015-06-18 | 2019-08-07 | Wizr | Cloud platform with multi camera synchronization |
US10393506B2 (en) | 2015-07-15 | 2019-08-27 | Hand Held Products, Inc. | Method for a mobile dimensioning device to use a dynamic accuracy compatible with NIST standard |
US10393508B2 (en) | 2014-10-21 | 2019-08-27 | Hand Held Products, Inc. | Handheld dimensioning system with measurement-conformance feedback |
EP3405889A4 (en) * | 2016-01-21 | 2019-08-28 | Wizr LLC | Cloud platform with multi camera synchronization |
US10417882B2 (en) | 2017-10-24 | 2019-09-17 | The Chamberlain Group, Inc. | Direction sensitive motion detector camera |
US20190304273A1 (en) * | 2018-03-28 | 2019-10-03 | Hon Hai Precision Industry Co., Ltd. | Image surveillance device and method of processing images |
US10452713B2 (en) * | 2014-09-30 | 2019-10-22 | Apple Inc. | Video analysis techniques for improved editing, navigation, and summarization |
EP3561722A1 (en) * | 2018-04-24 | 2019-10-30 | Toshiba Tec Kabushiki Kaisha | Video image analysis apparatus and video image analysis method |
US10489660B2 (en) | 2016-01-21 | 2019-11-26 | Wizr Llc | Video processing with object identification |
US20200013273A1 (en) * | 2018-07-04 | 2020-01-09 | Arm Ip Limited | Event entity monitoring network and method |
US10584962B2 (en) | 2018-05-01 | 2020-03-10 | Hand Held Products, Inc | System and method for validating physical-item security |
US10593130B2 (en) | 2015-05-19 | 2020-03-17 | Hand Held Products, Inc. | Evaluating image values |
JP2020047259A (en) * | 2019-07-11 | 2020-03-26 | パナソニックi−PROセンシングソリューションズ株式会社 | Person search system and person search method |
US10635922B2 (en) | 2012-05-15 | 2020-04-28 | Hand Held Products, Inc. | Terminals and methods for dimensioning objects |
US10657382B2 (en) | 2016-07-11 | 2020-05-19 | Google Llc | Methods and systems for person detection in a video feed |
US10708673B2 (en) | 2015-09-25 | 2020-07-07 | Qualcomm Incorporated | Systems and methods for video processing |
US10733748B2 (en) | 2017-07-24 | 2020-08-04 | Hand Held Products, Inc. | Dual-pattern optical 3D dimensioning |
WO2020159386A1 (en) * | 2019-02-01 | 2020-08-06 | Andersen Terje N | Method and system for extracting metadata from an observed scene |
WO2020168286A1 (en) * | 2019-02-14 | 2020-08-20 | University Of Washington | Systems and methods for improved nanopore-based analysis of nucleic acids |
US10775165B2 (en) | 2014-10-10 | 2020-09-15 | Hand Held Products, Inc. | Methods for improving the accuracy of dimensioning-system measurements |
WO2020240212A1 (en) * | 2019-05-30 | 2020-12-03 | Seequestor Ltd | Control system and method |
US20200394589A1 (en) * | 2019-06-12 | 2020-12-17 | Shoppertrak Rct Corporation | Methods and systems for monitoring workers in a retail environment |
US20200410227A1 (en) * | 2016-06-30 | 2020-12-31 | Snap Inc. | Object modeling and replacement in a video stream |
US10887561B2 (en) | 2015-09-02 | 2021-01-05 | Nec Corporation | Surveillance system, surveillance method, and program |
US10909708B2 (en) | 2016-12-09 | 2021-02-02 | Hand Held Products, Inc. | Calibrating a dimensioner using ratios of measurable parameters of optic ally-perceptible geometric elements |
US10908013B2 (en) | 2012-10-16 | 2021-02-02 | Hand Held Products, Inc. | Dimensioning system |
US20210042509A1 (en) * | 2019-08-05 | 2021-02-11 | Shoppertrak Rct Corporation | Methods and systems for monitoring potential losses in a retail environment |
US20210041886A1 (en) * | 2018-01-24 | 2021-02-11 | Zhuineng Robotics (Shanghai) Co., Ltd. | Multi-device visual navigation method and system in variable scene |
US10931923B2 (en) | 2015-09-02 | 2021-02-23 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
ES2821017A1 (en) * | 2019-10-23 | 2021-04-23 | Future Connections Holding B V | System of counting and control of capacity in facilities (Machine-translation by Google Translate, not legally binding) |
US11029762B2 (en) | 2015-07-16 | 2021-06-08 | Hand Held Products, Inc. | Adjusting dimensioning results using augmented reality |
US11047672B2 (en) | 2017-03-28 | 2021-06-29 | Hand Held Products, Inc. | System for optically dimensioning |
US11074791B2 (en) * | 2017-04-20 | 2021-07-27 | David Lee Selinger | Automatic threat detection based on video frame delta information in compressed video streams |
US11082701B2 (en) | 2016-05-27 | 2021-08-03 | Google Llc | Methods and devices for dynamic adaptation of encoding bitrate for video streaming |
US20210304457A1 (en) * | 2020-03-31 | 2021-09-30 | The Regents Of The University Of California | Using neural networks to estimate motion vectors for motion corrected pet image reconstruction |
US11176411B2 (en) | 2019-02-28 | 2021-11-16 | Stats Llc | System and method for player reidentification in broadcast video |
USD939980S1 (en) | 2019-06-17 | 2022-01-04 | Guard, Inc. | Data and sensor system hub |
US11277591B2 (en) | 2015-09-02 | 2022-03-15 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
US11321656B2 (en) * | 2019-07-23 | 2022-05-03 | Fanuc Corporation | Difference extracting device |
US20220222496A1 (en) * | 2021-01-13 | 2022-07-14 | Fotonation Limited | Image processing system |
US11516441B1 (en) * | 2021-03-16 | 2022-11-29 | Kanya Kamangu | 360 degree video recording and playback device |
US11527071B2 (en) | 2018-09-20 | 2022-12-13 | i-PRO Co., Ltd. | Person search system and person search method |
US11579904B2 (en) * | 2018-07-02 | 2023-02-14 | Panasonic Intellectual Property Management Co., Ltd. | Learning data collection device, learning data collection system, and learning data collection method |
US11599259B2 (en) | 2015-06-14 | 2023-03-07 | Google Llc | Methods and systems for presenting alert event indicators |
US11639846B2 (en) | 2019-09-27 | 2023-05-02 | Honeywell International Inc. | Dual-pattern optical 3D dimensioning |
US20230134663A1 (en) * | 2021-11-02 | 2023-05-04 | Steven Roskowski | Transforming Surveillance Sensor Data into Event Metadata, Bounding Boxes, Recognized Object Classes, Learning Density Patterns, Variation Trends, Normality, Projections, Topology; Determining Variances Out of Normal Range and Security Events; and Initiating Remediation and Actuating Physical Access Control Facilitation |
WO2023081047A3 (en) * | 2021-11-04 | 2023-06-15 | Op Solutions, Llc | Systems and methods for object and event detection and feature-based rate-distortion optimization for video coding |
US11710387B2 (en) | 2017-09-20 | 2023-07-25 | Google Llc | Systems and methods of detecting and responding to a visitor to a smart home environment |
US20230292709A1 (en) * | 2019-08-16 | 2023-09-21 | Stephanie Sujin CHOI | Method for clustering and identifying animals based on the shapes, relative positions and other features of body parts |
US11783010B2 (en) | 2017-05-30 | 2023-10-10 | Google Llc | Systems and methods of person recognition in video streams |
WO2023196778A1 (en) * | 2022-04-04 | 2023-10-12 | Agilysys Nv, Llc | System and method for synchronizing 2d camera data for item recognition in images |
US11935247B2 (en) | 2023-02-27 | 2024-03-19 | Stats Llc | System and method for calibrating moving cameras capturing broadcast video |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10110858B2 (en) * | 2015-02-06 | 2018-10-23 | Conduent Business Services, Llc | Computer-vision based process recognition of activity workflow of human performer |
US10129156B2 (en) * | 2015-03-31 | 2018-11-13 | At&T Intellectual Property I, L.P. | Dynamic creation and management of ephemeral coordinated feedback instances |
US10984355B2 (en) * | 2015-04-17 | 2021-04-20 | Xerox Corporation | Employee task verification to video system |
EP3456040B1 (en) * | 2016-05-09 | 2020-09-23 | Sony Corporation | Surveillance system and method for camera-based surveillance |
CN107786838B (en) * | 2016-08-26 | 2020-04-03 | 杭州海康威视系统技术有限公司 | Video monitoring method, device and system |
CN107886103B (en) * | 2016-09-29 | 2023-12-08 | 日本电气株式会社 | Method, device and system for identifying behavior patterns |
JP2018055607A (en) * | 2016-09-30 | 2018-04-05 | 富士通株式会社 | Event detection program, event detection device, and event detection method |
EP3839821A3 (en) * | 2016-10-25 | 2021-09-15 | Owl Cameras, Inc. | Video-based data collection, image capture and analysis configuration |
EP3340103A1 (en) * | 2016-12-21 | 2018-06-27 | Axis AB | Method for identifying events in a motion video |
CN106781014B (en) * | 2017-01-24 | 2018-05-18 | 广州市蚁道互联网有限公司 | Automatic vending machine and its operation method |
US10698068B2 (en) * | 2017-03-24 | 2020-06-30 | Samsung Electronics Co., Ltd. | System and method for synchronizing tracking points |
US20190057589A1 (en) * | 2017-08-18 | 2019-02-21 | Honeywell International Inc. | Cloud based systems and methods for locating a peace breaker |
US10546197B2 (en) | 2017-09-26 | 2020-01-28 | Ambient AI, Inc. | Systems and methods for intelligent and interpretive analysis of video image data using machine learning |
KR102437456B1 (en) * | 2017-11-14 | 2022-08-26 | 애플 인크. | Event camera-based deformable object tracking |
US10628706B2 (en) * | 2018-05-11 | 2020-04-21 | Ambient AI, Inc. | Systems and methods for intelligent and interpretive analysis of sensor data and generating spatial intelligence using machine learning |
CN110881093B (en) * | 2018-09-05 | 2022-03-11 | 华为技术有限公司 | Distributed camera |
US11087144B2 (en) * | 2018-10-10 | 2021-08-10 | Harman International Industries, Incorporated | System and method for determining vehicle data set familiarity |
CN111105109A (en) * | 2018-10-25 | 2020-05-05 | 玳能本股份有限公司 | Operation detection device, operation detection method, and operation detection system |
US11443515B2 (en) | 2018-12-21 | 2022-09-13 | Ambient AI, Inc. | Systems and methods for machine learning enhanced intelligent building access endpoint security monitoring and management |
US11195067B2 (en) | 2018-12-21 | 2021-12-07 | Ambient AI, Inc. | Systems and methods for machine learning-based site-specific threat modeling and threat detection |
RU2694140C1 (en) * | 2019-04-04 | 2019-07-09 | Общество с ограниченной ответственностью "Скайтрэк" (ООО "Скайтрэк") | Method of human identification in a mode of simultaneous operation of a group of video cameras |
US11151194B2 (en) | 2019-05-09 | 2021-10-19 | Sap Se | Data collection and integration system |
CN110717248A (en) * | 2019-09-11 | 2020-01-21 | 武汉光庭信息技术股份有限公司 | Method and system for generating automatic driving simulation scene, server and medium |
US11361543B2 (en) | 2019-12-10 | 2022-06-14 | Caterpillar Inc. | System and method for detecting objects |
CN110996066B (en) * | 2019-12-19 | 2021-12-24 | 浙江中控技术股份有限公司 | Accident backtracking method and device |
US20230377335A1 (en) * | 2020-11-10 | 2023-11-23 | Intel Corporation | Key person recognition in immersive video |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110019003A1 (en) * | 2009-07-22 | 2011-01-27 | Hitachi Kokusai Electric Inc. | Surveillance image retrieval apparatus and surveillance system |
US20120274776A1 (en) * | 2011-04-29 | 2012-11-01 | Canon Kabushiki Kaisha | Fault tolerant background modelling |
US20120300081A1 (en) * | 2011-05-24 | 2012-11-29 | Samsung Techwin Co., Ltd. | Surveillance system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4099973B2 (en) * | 2001-10-30 | 2008-06-11 | 松下電器産業株式会社 | Video data transmission method, video data reception method, and video surveillance system |
ATE364299T1 (en) * | 2002-07-05 | 2007-06-15 | Agent Video Intelligence Ltd | METHOD AND SYSTEM FOR EFFECTIVE EVENT RECOGNITION IN A LARGE NUMBER OF SIMULTANEOUS IMAGE SEQUENCES |
US7956889B2 (en) | 2003-06-04 | 2011-06-07 | Model Software Corporation | Video surveillance system |
CN101872477B (en) * | 2009-04-24 | 2014-07-16 | 索尼株式会社 | Method and device for detecting object in image and system containing device |
CN102457713B (en) * | 2010-10-29 | 2014-06-25 | 西门子公司 | Track-side fault detection system, and implementation method and implementation device of same |
US8665345B2 (en) * | 2011-05-18 | 2014-03-04 | Intellectual Ventures Fund 83 Llc | Video summary including a feature of interest |
-
2014
- 2014-05-09 CN CN201480039208.2A patent/CN105531995B/en active Active
- 2014-05-09 US US14/273,653 patent/US9665777B2/en active Active
- 2014-05-09 EP EP14795321.0A patent/EP2995079A4/en not_active Ceased
- 2014-05-09 WO PCT/US2014/037449 patent/WO2014183004A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110019003A1 (en) * | 2009-07-22 | 2011-01-27 | Hitachi Kokusai Electric Inc. | Surveillance image retrieval apparatus and surveillance system |
US20120274776A1 (en) * | 2011-04-29 | 2012-11-01 | Canon Kabushiki Kaisha | Fault tolerant background modelling |
US20120300081A1 (en) * | 2011-05-24 | 2012-11-29 | Samsung Techwin Co., Ltd. | Surveillance system |
Cited By (212)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9779546B2 (en) | 2012-05-04 | 2017-10-03 | Intermec Ip Corp. | Volume dimensioning systems and methods |
US10467806B2 (en) | 2012-05-04 | 2019-11-05 | Intermec Ip Corp. | Volume dimensioning systems and methods |
US10635922B2 (en) | 2012-05-15 | 2020-04-28 | Hand Held Products, Inc. | Terminals and methods for dimensioning objects |
US9939259B2 (en) | 2012-10-04 | 2018-04-10 | Hand Held Products, Inc. | Measuring object dimensions using mobile computer |
US10908013B2 (en) | 2012-10-16 | 2021-02-02 | Hand Held Products, Inc. | Dimensioning system |
US10009579B2 (en) | 2012-11-21 | 2018-06-26 | Pelco, Inc. | Method and system for counting people using depth sensor |
US9367733B2 (en) | 2012-11-21 | 2016-06-14 | Pelco, Inc. | Method and apparatus for detecting people by a surveillance system |
US20160019427A1 (en) * | 2013-03-11 | 2016-01-21 | Michael Scott Martin | Video surveillence system for detecting firearms |
US9639747B2 (en) * | 2013-03-15 | 2017-05-02 | Pelco, Inc. | Online learning method for people detection and counting for retail stores |
US20140270358A1 (en) * | 2013-03-15 | 2014-09-18 | Pelco, Inc. | Online Learning Method for People Detection and Counting for Retail Stores |
US20160125267A1 (en) * | 2013-06-04 | 2016-05-05 | Elbit Systems Land And C4I Ltd. | Method and system for coordinating between image sensors |
US9466011B2 (en) * | 2013-06-04 | 2016-10-11 | Elbit Systems Land and C4l Ltd. | Method and system for coordinating between image sensors |
US10203402B2 (en) | 2013-06-07 | 2019-02-12 | Hand Held Products, Inc. | Method of error correction for 3D imaging device |
US10228452B2 (en) | 2013-06-07 | 2019-03-12 | Hand Held Products, Inc. | Method of error correction for 3D imaging device |
US9805619B2 (en) | 2014-02-07 | 2017-10-31 | International Business Machines Corporation | Intelligent glasses for the visually impaired |
US9488833B2 (en) * | 2014-02-07 | 2016-11-08 | International Business Machines Corporation | Intelligent glasses for the visually impaired |
US20150227778A1 (en) * | 2014-02-07 | 2015-08-13 | International Business Machines Corporation | Intelligent glasses for the visually impaired |
US9674570B2 (en) | 2014-07-07 | 2017-06-06 | Google Inc. | Method and system for detecting and presenting video feed |
US9420331B2 (en) * | 2014-07-07 | 2016-08-16 | Google Inc. | Method and system for categorizing detected motion events |
US9479822B2 (en) | 2014-07-07 | 2016-10-25 | Google Inc. | Method and system for categorizing detected motion events |
US9940523B2 (en) | 2014-07-07 | 2018-04-10 | Google Llc | Video monitoring user interface for displaying motion events feed |
US9501915B1 (en) | 2014-07-07 | 2016-11-22 | Google Inc. | Systems and methods for analyzing a video stream |
US10192120B2 (en) | 2014-07-07 | 2019-01-29 | Google Llc | Method and system for generating a smart time-lapse video clip |
US9544636B2 (en) | 2014-07-07 | 2017-01-10 | Google Inc. | Method and system for editing event categories |
US10789821B2 (en) | 2014-07-07 | 2020-09-29 | Google Llc | Methods and systems for camera-side cropping of a video feed |
US10180775B2 (en) | 2014-07-07 | 2019-01-15 | Google Llc | Method and system for displaying recorded and live video feeds |
US10140827B2 (en) | 2014-07-07 | 2018-11-27 | Google Llc | Method and system for processing motion event notifications |
US9602860B2 (en) | 2014-07-07 | 2017-03-21 | Google Inc. | Method and system for displaying recorded and live video feeds |
US9213903B1 (en) | 2014-07-07 | 2015-12-15 | Google Inc. | Method and system for cluster-based video monitoring and event categorization |
US9609380B2 (en) | 2014-07-07 | 2017-03-28 | Google Inc. | Method and system for detecting and presenting a new event in a video feed |
US10977918B2 (en) | 2014-07-07 | 2021-04-13 | Google Llc | Method and system for generating a smart time-lapse video clip |
US10467872B2 (en) | 2014-07-07 | 2019-11-05 | Google Llc | Methods and systems for updating an event timeline with event indicators |
US9449229B1 (en) | 2014-07-07 | 2016-09-20 | Google Inc. | Systems and methods for categorizing motion event candidates |
US10452921B2 (en) | 2014-07-07 | 2019-10-22 | Google Llc | Methods and systems for displaying video streams |
US11011035B2 (en) | 2014-07-07 | 2021-05-18 | Google Llc | Methods and systems for detecting persons in a smart home environment |
US9672427B2 (en) | 2014-07-07 | 2017-06-06 | Google Inc. | Systems and methods for categorizing motion events |
US9886161B2 (en) | 2014-07-07 | 2018-02-06 | Google Llc | Method and system for motion vector-based video monitoring and event categorization |
US10127783B2 (en) | 2014-07-07 | 2018-11-13 | Google Llc | Method and device for processing motion events |
US10108862B2 (en) | 2014-07-07 | 2018-10-23 | Google Llc | Methods and systems for displaying live video and recorded video |
US9224044B1 (en) | 2014-07-07 | 2015-12-29 | Google Inc. | Method and system for video zone monitoring |
US10867496B2 (en) | 2014-07-07 | 2020-12-15 | Google Llc | Methods and systems for presenting video feeds |
US11250679B2 (en) | 2014-07-07 | 2022-02-15 | Google Llc | Systems and methods for categorizing motion events |
US11062580B2 (en) | 2014-07-07 | 2021-07-13 | Google Llc | Methods and systems for updating an event timeline with event indicators |
US9489580B2 (en) | 2014-07-07 | 2016-11-08 | Google Inc. | Method and system for cluster-based video monitoring and event categorization |
US9158974B1 (en) | 2014-07-07 | 2015-10-13 | Google Inc. | Method and system for motion vector-based video monitoring and event categorization |
US9779307B2 (en) * | 2014-07-07 | 2017-10-03 | Google Inc. | Method and system for non-causal zone search in video monitoring |
US9354794B2 (en) | 2014-07-07 | 2016-05-31 | Google Inc. | Method and system for performing client-side zooming of a remote video feed |
US20170223261A1 (en) * | 2014-07-31 | 2017-08-03 | Hitachi Maxell, Ltd. | Image pickup device and method of tracking subject thereof |
US11860511B2 (en) | 2014-07-31 | 2024-01-02 | Maxell, Ltd. | Image pickup device and method of tracking subject thereof |
US10609273B2 (en) * | 2014-07-31 | 2020-03-31 | Maxell, Ltd. | Image pickup device and method of tracking subject thereof |
US10240914B2 (en) | 2014-08-06 | 2019-03-26 | Hand Held Products, Inc. | Dimensioning system with guided alignment |
US9170707B1 (en) | 2014-09-30 | 2015-10-27 | Google Inc. | Method and system for generating a smart time-lapse video clip |
US10452713B2 (en) * | 2014-09-30 | 2019-10-22 | Apple Inc. | Video analysis techniques for improved editing, navigation, and summarization |
USD782495S1 (en) | 2014-10-07 | 2017-03-28 | Google Inc. | Display screen or portion thereof with graphical user interface |
USD893508S1 (en) | 2014-10-07 | 2020-08-18 | Google Llc | Display screen or portion thereof with graphical user interface |
US9779276B2 (en) | 2014-10-10 | 2017-10-03 | Hand Held Products, Inc. | Depth sensor based auto-focus system for an indicia scanner |
US10134120B2 (en) | 2014-10-10 | 2018-11-20 | Hand Held Products, Inc. | Image-stitching for dimensioning |
US10775165B2 (en) | 2014-10-10 | 2020-09-15 | Hand Held Products, Inc. | Methods for improving the accuracy of dimensioning-system measurements |
US10810715B2 (en) | 2014-10-10 | 2020-10-20 | Hand Held Products, Inc | System and method for picking validation |
US10859375B2 (en) | 2014-10-10 | 2020-12-08 | Hand Held Products, Inc. | Methods for improving the accuracy of dimensioning-system measurements |
US10121039B2 (en) | 2014-10-10 | 2018-11-06 | Hand Held Products, Inc. | Depth sensor based auto-focus system for an indicia scanner |
US10402956B2 (en) | 2014-10-10 | 2019-09-03 | Hand Held Products, Inc. | Image-stitching for dimensioning |
US20180061064A1 (en) * | 2014-10-15 | 2018-03-01 | Comcast Cable Communications, Llc | Generation of event video frames for content |
US20230040708A1 (en) * | 2014-10-15 | 2023-02-09 | Comcast Cable Communications, Llc | Determining One or More Events in Content |
US11461904B2 (en) * | 2014-10-15 | 2022-10-04 | Comcast Cable Communications, Llc | Determining one or more events in content |
US10657653B2 (en) * | 2014-10-15 | 2020-05-19 | Comcast Cable Communications, Llc | Determining one or more events in content |
US10393508B2 (en) | 2014-10-21 | 2019-08-27 | Hand Held Products, Inc. | Handheld dimensioning system with measurement-conformance feedback |
US10218964B2 (en) | 2014-10-21 | 2019-02-26 | Hand Held Products, Inc. | Dimensioning system with feedback |
US9363353B1 (en) * | 2014-12-04 | 2016-06-07 | Hon Man Ashley Chik | Mobile phone docks with multiple circulating phone connectors |
CN107409198A (en) * | 2015-03-04 | 2017-11-28 | 株式会社日立系统 | Situation based on camera image data confirms system and control device and the situation confirmation method based on camera image data |
US10430668B2 (en) | 2015-03-04 | 2019-10-01 | Hitachi Systems, Ltd. | Situation ascertainment system using camera picture data, control device, and situation ascertainment method using camera picture data |
US20160286171A1 (en) * | 2015-03-23 | 2016-09-29 | Fred Cheng | Motion data extraction and vectorization |
US11523090B2 (en) * | 2015-03-23 | 2022-12-06 | The Chamberlain Group Llc | Motion data extraction and vectorization |
US11403887B2 (en) | 2015-05-19 | 2022-08-02 | Hand Held Products, Inc. | Evaluating image values |
US11906280B2 (en) | 2015-05-19 | 2024-02-20 | Hand Held Products, Inc. | Evaluating image values |
US10593130B2 (en) | 2015-05-19 | 2020-03-17 | Hand Held Products, Inc. | Evaluating image values |
US11599259B2 (en) | 2015-06-14 | 2023-03-07 | Google Llc | Methods and systems for presenting alert event indicators |
US10066982B2 (en) | 2015-06-16 | 2018-09-04 | Hand Held Products, Inc. | Calibrating a volume dimensioner |
EP3311334A4 (en) * | 2015-06-18 | 2019-08-07 | Wizr | Cloud platform with multi camera synchronization |
US10247547B2 (en) | 2015-06-23 | 2019-04-02 | Hand Held Products, Inc. | Optical pattern projector |
US10612958B2 (en) | 2015-07-07 | 2020-04-07 | Hand Held Products, Inc. | Mobile dimensioner apparatus to mitigate unfair charging practices in commerce |
US9835486B2 (en) | 2015-07-07 | 2017-12-05 | Hand Held Products, Inc. | Mobile dimensioner apparatus for use in commerce |
US10393506B2 (en) | 2015-07-15 | 2019-08-27 | Hand Held Products, Inc. | Method for a mobile dimensioning device to use a dynamic accuracy compatible with NIST standard |
US11353319B2 (en) | 2015-07-15 | 2022-06-07 | Hand Held Products, Inc. | Method for a mobile dimensioning device to use a dynamic accuracy compatible with NIST standard |
US11029762B2 (en) | 2015-07-16 | 2021-06-08 | Hand Held Products, Inc. | Adjusting dimensioning results using augmented reality |
US10094650B2 (en) | 2015-07-16 | 2018-10-09 | Hand Held Products, Inc. | Dimensioning and imaging items |
US20170061214A1 (en) * | 2015-08-31 | 2017-03-02 | General Electric Company | Controlling bandwith utilization of video transmissions for quality and scalability |
US11134226B2 (en) | 2015-09-02 | 2021-09-28 | Nec Corporation | Surveillance system, surveillance method, and program |
US10977916B2 (en) * | 2015-09-02 | 2021-04-13 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
US20190213855A1 (en) * | 2015-09-02 | 2019-07-11 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
US11277591B2 (en) | 2015-09-02 | 2022-03-15 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
US10887561B2 (en) | 2015-09-02 | 2021-01-05 | Nec Corporation | Surveillance system, surveillance method, and program |
US10931923B2 (en) | 2015-09-02 | 2021-02-23 | Nec Corporation | Surveillance system, surveillance network construction method, and program |
US10972706B2 (en) | 2015-09-02 | 2021-04-06 | Nec Corporation | Surveillance system, surveillance method, and program |
US10984364B2 (en) * | 2015-09-04 | 2021-04-20 | International Business Machines Corporation | Summarization of a recording for quality control |
US20170068921A1 (en) * | 2015-09-04 | 2017-03-09 | International Business Machines Corporation | Summarization of a recording for quality control |
US10984363B2 (en) * | 2015-09-04 | 2021-04-20 | International Business Machines Corporation | Summarization of a recording for quality control |
US20170068920A1 (en) * | 2015-09-04 | 2017-03-09 | International Business Machines Corporation | Summarization of a recording for quality control |
US9906704B2 (en) * | 2015-09-17 | 2018-02-27 | Qualcomm Incorporated | Managing crowd sourced photography in a wireless network |
US10708673B2 (en) | 2015-09-25 | 2020-07-07 | Qualcomm Incorporated | Systems and methods for video processing |
US20170091563A1 (en) * | 2015-09-28 | 2017-03-30 | International Business Machines Corporation | Discovering object pathways in a camera network |
US9704046B2 (en) * | 2015-09-28 | 2017-07-11 | International Business Machines Corporation | Discovering object pathways in a camera network |
US9454884B1 (en) * | 2015-09-28 | 2016-09-27 | International Business Machines Corporation | Discovering object pathways in a camera network |
US9495763B1 (en) * | 2015-09-28 | 2016-11-15 | International Business Machines Corporation | Discovering object pathways in a camera network |
US9532012B1 (en) * | 2015-09-28 | 2016-12-27 | International Business Machines Corporation | Discovering object pathways in a camera network |
US20190073538A1 (en) * | 2015-10-06 | 2019-03-07 | Agent Video Intelligence Ltd. | Method and system for classifying objects from a stream of images |
WO2017060894A1 (en) * | 2015-10-06 | 2017-04-13 | Agent Video Intelligence Ltd. | Method and system for classifying objects from a stream of images |
US10249030B2 (en) | 2015-10-30 | 2019-04-02 | Hand Held Products, Inc. | Image transformation for indicia reading |
US10225544B2 (en) | 2015-11-19 | 2019-03-05 | Hand Held Products, Inc. | High resolution dot pattern |
US10382717B2 (en) * | 2015-11-20 | 2019-08-13 | Vivotek Inc. | Video file playback system capable of previewing image, method thereof, and computer program product |
US20170150093A1 (en) * | 2015-11-20 | 2017-05-25 | Vivotek Inc. | Video file playback system capable of previewing image, method thereof, and computer program product |
CN107040744A (en) * | 2015-11-20 | 2017-08-11 | 晶睿通讯股份有限公司 | Can preview screen video signal archives playback system and its method and computer program product |
US20170148183A1 (en) * | 2015-11-25 | 2017-05-25 | Behavioral Recognition Systems, Inc. | Image driver that samples high-resolution image data |
US10853961B1 (en) * | 2015-11-25 | 2020-12-01 | Intellective Ai, Inc. | Image driver that samples high-resolution image data |
US10102642B2 (en) * | 2015-11-25 | 2018-10-16 | Omni Ai, Inc. | Image driver that samples high-resolution image data |
CN106982347A (en) * | 2016-01-16 | 2017-07-25 | 阔展科技(深圳)有限公司 | Has the intelligent mobile monitor of extraction and analysis data capability |
US10489660B2 (en) | 2016-01-21 | 2019-11-26 | Wizr Llc | Video processing with object identification |
EP3405889A4 (en) * | 2016-01-21 | 2019-08-28 | Wizr LLC | Cloud platform with multi camera synchronization |
WO2017130187A1 (en) * | 2016-01-26 | 2017-08-03 | Coral Detection Systems Ltd. | Methods and systems for drowning detection |
US11216654B2 (en) * | 2016-01-26 | 2022-01-04 | Coral Detection Systems Ltd. | Methods and systems for drowning detection |
US10747227B2 (en) | 2016-01-27 | 2020-08-18 | Hand Held Products, Inc. | Vehicle positioning and object avoidance |
US10025314B2 (en) | 2016-01-27 | 2018-07-17 | Hand Held Products, Inc. | Vehicle positioning and object avoidance |
WO2017131723A1 (en) * | 2016-01-29 | 2017-08-03 | Hewlett Packard Enterprise Development Lp | Generating a test case for a recorded stream of events |
US20190080575A1 (en) * | 2016-04-07 | 2019-03-14 | Hanwha Techwin Co., Ltd. | Surveillance system and control method thereof |
US11538316B2 (en) * | 2016-04-07 | 2022-12-27 | Hanwha Techwin Co., Ltd. | Surveillance system and control method thereof |
US10083359B2 (en) * | 2016-04-28 | 2018-09-25 | Motorola Solutions, Inc. | Method and device for incident situation prediction |
US11082701B2 (en) | 2016-05-27 | 2021-08-03 | Google Llc | Methods and devices for dynamic adaptation of encoding bitrate for video streaming |
US10339352B2 (en) | 2016-06-03 | 2019-07-02 | Hand Held Products, Inc. | Wearable metrological apparatus |
US10872214B2 (en) | 2016-06-03 | 2020-12-22 | Hand Held Products, Inc. | Wearable metrological apparatus |
US9940721B2 (en) | 2016-06-10 | 2018-04-10 | Hand Held Products, Inc. | Scene change detection in a dimensioner |
US10163216B2 (en) | 2016-06-15 | 2018-12-25 | Hand Held Products, Inc. | Automatic mode switching in a volume dimensioner |
CN105939464A (en) * | 2016-06-15 | 2016-09-14 | 童迎伟 | Intelligent vehicle-mounted monitoring system and safety monitoring method thereof |
US10417769B2 (en) | 2016-06-15 | 2019-09-17 | Hand Held Products, Inc. | Automatic mode switching in a volume dimensioner |
US11676412B2 (en) * | 2016-06-30 | 2023-06-13 | Snap Inc. | Object modeling and replacement in a video stream |
US20200410227A1 (en) * | 2016-06-30 | 2020-12-31 | Snap Inc. | Object modeling and replacement in a video stream |
US11587320B2 (en) | 2016-07-11 | 2023-02-21 | Google Llc | Methods and systems for person detection in a video feed |
US10657382B2 (en) | 2016-07-11 | 2020-05-19 | Google Llc | Methods and systems for person detection in a video feed |
CN108073857A (en) * | 2016-11-14 | 2018-05-25 | 北京三星通信技术研究有限公司 | The method and device of dynamic visual sensor DVS event handlings |
KR20190099443A (en) * | 2016-12-05 | 2019-08-27 | 아비질론 코포레이션 | Systems and Methods for Appearance Navigation |
US11113587B2 (en) | 2016-12-05 | 2021-09-07 | Avigilon Corporation | System and method for appearance search |
JP2022023887A (en) * | 2016-12-05 | 2022-02-08 | アビジロン コーポレイション | Appearance search system and method |
JP2020503604A (en) * | 2016-12-05 | 2020-01-30 | アビギロン コーポレイションAvigilon Corporation | Appearance search system and method |
IL267115A (en) * | 2016-12-05 | 2019-08-29 | Avigilon Corp | System and method for appearance search |
AU2022252799B2 (en) * | 2016-12-05 | 2023-09-28 | Motorola Solutions, Inc. | System and method for appearance search |
EP3549063A4 (en) * | 2016-12-05 | 2020-06-24 | Avigilon Corporation | System and method for appearance search |
KR102560308B1 (en) * | 2016-12-05 | 2023-07-27 | 모토로라 솔루션즈, 인크. | System and method for exterior search |
JP7317919B2 (en) | 2016-12-05 | 2023-07-31 | モトローラ ソリューションズ インコーポレイテッド | Appearance search system and method |
WO2018102919A1 (en) | 2016-12-05 | 2018-06-14 | Avigilon Corporation | System and method for appearance search |
US10726312B2 (en) | 2016-12-05 | 2020-07-28 | Avigilon Corporation | System and method for appearance search |
AU2017372905B2 (en) * | 2016-12-05 | 2022-08-11 | Motorola Solutions, Inc. | System and method for appearance search |
US10909708B2 (en) | 2016-12-09 | 2021-02-02 | Hand Held Products, Inc. | Calibrating a dimensioner using ratios of measurable parameters of optic ally-perceptible geometric elements |
US11657278B2 (en) | 2017-03-24 | 2023-05-23 | Numenta, Inc. | Location processor for inferencing and learning based on sensorimotor input data |
WO2018175968A1 (en) * | 2017-03-24 | 2018-09-27 | Numenta, Inc. | Location processor for inferencing and learning based on sensorimotor input data |
US10733436B2 (en) * | 2017-03-24 | 2020-08-04 | Numenta, Inc. | Location processor for inferencing and learning based on sensorimotor input data |
US20180276464A1 (en) * | 2017-03-24 | 2018-09-27 | Numenta, Inc. | Location processor for inferencing and learning based on sensorimotor input data |
US11047672B2 (en) | 2017-03-28 | 2021-06-29 | Hand Held Products, Inc. | System for optically dimensioning |
US10733450B1 (en) | 2017-03-30 | 2020-08-04 | Amazon Technologies, Inc. | Multi-video annotation |
US11393207B1 (en) | 2017-03-30 | 2022-07-19 | Amazon Technologies, Inc. | Multi-video annotation |
US10223591B1 (en) * | 2017-03-30 | 2019-03-05 | Amazon Technologies, Inc. | Multi-video annotation |
US11074791B2 (en) * | 2017-04-20 | 2021-07-27 | David Lee Selinger | Automatic threat detection based on video frame delta information in compressed video streams |
US11783010B2 (en) | 2017-05-30 | 2023-10-10 | Google Llc | Systems and methods of person recognition in video streams |
US10733748B2 (en) | 2017-07-24 | 2020-08-04 | Hand Held Products, Inc. | Dual-pattern optical 3D dimensioning |
US11710387B2 (en) | 2017-09-20 | 2023-07-25 | Google Llc | Systems and methods of detecting and responding to a visitor to a smart home environment |
US10417882B2 (en) | 2017-10-24 | 2019-09-17 | The Chamberlain Group, Inc. | Direction sensitive motion detector camera |
US10679476B2 (en) | 2017-10-24 | 2020-06-09 | The Chamberlain Group, Inc. | Method of using a camera to detect direction of motion |
US20190156496A1 (en) * | 2017-11-21 | 2019-05-23 | Reliance Core Consulting LLC | Methods, systems, apparatuses and devices for facilitating motion analysis in an environment |
US10867398B2 (en) * | 2017-11-21 | 2020-12-15 | Reliance Core Consulting LLC | Methods, systems, apparatuses and devices for facilitating motion analysis in an environment |
US10354169B1 (en) * | 2017-12-22 | 2019-07-16 | Motorola Solutions, Inc. | Method, device, and system for adaptive training of machine learning models via detected in-field contextual sensor events and associated located and retrieved digital audio and/or video imaging |
US20210041886A1 (en) * | 2018-01-24 | 2021-02-11 | Zhuineng Robotics (Shanghai) Co., Ltd. | Multi-device visual navigation method and system in variable scene |
US20190304273A1 (en) * | 2018-03-28 | 2019-10-03 | Hon Hai Precision Industry Co., Ltd. | Image surveillance device and method of processing images |
US10242035B1 (en) * | 2018-04-02 | 2019-03-26 | Pond5, Inc. | Method and system for image searching |
US10346461B1 (en) * | 2018-04-02 | 2019-07-09 | Pond5 Inc. | Method and system for image searching by color |
EP3561722A1 (en) * | 2018-04-24 | 2019-10-30 | Toshiba Tec Kabushiki Kaisha | Video image analysis apparatus and video image analysis method |
US10584962B2 (en) | 2018-05-01 | 2020-03-10 | Hand Held Products, Inc | System and method for validating physical-item security |
US11579904B2 (en) * | 2018-07-02 | 2023-02-14 | Panasonic Intellectual Property Management Co., Ltd. | Learning data collection device, learning data collection system, and learning data collection method |
US11069214B2 (en) * | 2018-07-04 | 2021-07-20 | Seechange Technologies Limited | Event entity monitoring network and method |
US20200013273A1 (en) * | 2018-07-04 | 2020-01-09 | Arm Ip Limited | Event entity monitoring network and method |
US11527071B2 (en) | 2018-09-20 | 2022-12-13 | i-PRO Co., Ltd. | Person search system and person search method |
WO2020159386A1 (en) * | 2019-02-01 | 2020-08-06 | Andersen Terje N | Method and system for extracting metadata from an observed scene |
US20220108549A1 (en) * | 2019-02-01 | 2022-04-07 | Terje N. Andersen | Method and System for Extracting Metadata From an Observed Scene |
NO345328B1 (en) * | 2019-02-01 | 2020-12-14 | Roest Bernt Erik | Method and system for extracting metadata from an observed scene. |
US11676251B2 (en) * | 2019-02-01 | 2023-06-13 | Terje N. Andersen | Method and system for extracting metadata from an observed scene |
WO2020168286A1 (en) * | 2019-02-14 | 2020-08-20 | University Of Washington | Systems and methods for improved nanopore-based analysis of nucleic acids |
US11176411B2 (en) | 2019-02-28 | 2021-11-16 | Stats Llc | System and method for player reidentification in broadcast video |
US11861848B2 (en) | 2019-02-28 | 2024-01-02 | Stats Llc | System and method for generating trackable video frames from broadcast video |
US11379683B2 (en) | 2019-02-28 | 2022-07-05 | Stats Llc | System and method for generating trackable video frames from broadcast video |
US11586840B2 (en) | 2019-02-28 | 2023-02-21 | Stats Llc | System and method for player reidentification in broadcast video |
US11182642B2 (en) | 2019-02-28 | 2021-11-23 | Stats Llc | System and method for generating player tracking data from broadcast video |
US11593581B2 (en) * | 2019-02-28 | 2023-02-28 | Stats Llc | System and method for calibrating moving camera capturing broadcast video |
US11830202B2 (en) | 2019-02-28 | 2023-11-28 | Stats Llc | System and method for generating player tracking data from broadcast video |
US11861850B2 (en) | 2019-02-28 | 2024-01-02 | Stats Llc | System and method for player reidentification in broadcast video |
WO2020240212A1 (en) * | 2019-05-30 | 2020-12-03 | Seequestor Ltd | Control system and method |
US11868940B2 (en) * | 2019-06-12 | 2024-01-09 | Shoppertrak Rct Llc | Methods and systems for monitoring workers in a retail environment |
US20200394589A1 (en) * | 2019-06-12 | 2020-12-17 | Shoppertrak Rct Corporation | Methods and systems for monitoring workers in a retail environment |
USD939980S1 (en) | 2019-06-17 | 2022-01-04 | Guard, Inc. | Data and sensor system hub |
USD957966S1 (en) | 2019-06-17 | 2022-07-19 | Guard, Inc. | Tile sensor unit |
JP7235612B2 (en) | 2019-07-11 | 2023-03-08 | i-PRO株式会社 | Person search system and person search method |
JP2020047259A (en) * | 2019-07-11 | 2020-03-26 | パナソニックi−PROセンシングソリューションズ株式会社 | Person search system and person search method |
US11321656B2 (en) * | 2019-07-23 | 2022-05-03 | Fanuc Corporation | Difference extracting device |
US20210042509A1 (en) * | 2019-08-05 | 2021-02-11 | Shoppertrak Rct Corporation | Methods and systems for monitoring potential losses in a retail environment |
US11055518B2 (en) * | 2019-08-05 | 2021-07-06 | Sensormatic Electronics, LLC | Methods and systems for monitoring potential losses in a retail environment |
US20230292709A1 (en) * | 2019-08-16 | 2023-09-21 | Stephanie Sujin CHOI | Method for clustering and identifying animals based on the shapes, relative positions and other features of body parts |
US11639846B2 (en) | 2019-09-27 | 2023-05-02 | Honeywell International Inc. | Dual-pattern optical 3D dimensioning |
ES2821017A1 (en) * | 2019-10-23 | 2021-04-23 | Future Connections Holding B V | System of counting and control of capacity in facilities (Machine-translation by Google Translate, not legally binding) |
US20210304457A1 (en) * | 2020-03-31 | 2021-09-30 | The Regents Of The University Of California | Using neural networks to estimate motion vectors for motion corrected pet image reconstruction |
US11768919B2 (en) * | 2021-01-13 | 2023-09-26 | Fotonation Limited | Image processing system |
US20220222496A1 (en) * | 2021-01-13 | 2022-07-14 | Fotonation Limited | Image processing system |
US11516441B1 (en) * | 2021-03-16 | 2022-11-29 | Kanya Kamangu | 360 degree video recording and playback device |
US20230134663A1 (en) * | 2021-11-02 | 2023-05-04 | Steven Roskowski | Transforming Surveillance Sensor Data into Event Metadata, Bounding Boxes, Recognized Object Classes, Learning Density Patterns, Variation Trends, Normality, Projections, Topology; Determining Variances Out of Normal Range and Security Events; and Initiating Remediation and Actuating Physical Access Control Facilitation |
WO2023081047A3 (en) * | 2021-11-04 | 2023-06-15 | Op Solutions, Llc | Systems and methods for object and event detection and feature-based rate-distortion optimization for video coding |
WO2023196778A1 (en) * | 2022-04-04 | 2023-10-12 | Agilysys Nv, Llc | System and method for synchronizing 2d camera data for item recognition in images |
US11935247B2 (en) | 2023-02-27 | 2024-03-19 | Stats Llc | System and method for calibrating moving cameras capturing broadcast video |
Also Published As
Publication number | Publication date |
---|---|
WO2014183004A1 (en) | 2014-11-13 |
EP2995079A4 (en) | 2017-08-23 |
CN105531995B (en) | 2019-01-08 |
EP2995079A1 (en) | 2016-03-16 |
CN105531995A (en) | 2016-04-27 |
US9665777B2 (en) | 2017-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9665777B2 (en) | System and method for object and event identification using multiple cameras | |
Elharrouss et al. | A combined multiple action recognition and summarization for surveillance video sequences | |
Wang et al. | Human fall detection in surveillance video based on PCANet | |
Vishwakarma | A two-fold transformation model for human action recognition using decisive pose | |
Sharma et al. | Performance analysis of moving object detection using BGS techniques in visual surveillance | |
Hakeem et al. | Video analytics for business intelligence | |
US11527000B2 (en) | System and method for re-identifying target object based on location information of CCTV and movement information of object | |
Arivazhagan et al. | Human action recognition from RGB-D data using complete local binary pattern | |
Khan et al. | Spatio-temporal adversarial learning for detecting unseen falls | |
D'Orazio et al. | A survey of automatic event detection in multi-camera third generation surveillance systems | |
Jagadeesh et al. | Video based action detection and recognition human using optical flow and SVM classifier | |
Afsar et al. | Automatic human action recognition from video using hidden markov model | |
Raval et al. | Survey and analysis of human activity recognition in surveillance videos | |
Kushwaha et al. | Multiview human activity recognition system based on spatiotemporal template for video surveillance system | |
Yadav et al. | Human Illegal Activity Recognition Based on Deep Learning Techniques | |
Ladjailia et al. | Encoding human motion for automated activity recognition in surveillance applications | |
Verma et al. | Intensifying security with smart video surveillance | |
Gagvani | Challenges in video analytics | |
Bagga et al. | Person re-identification in multi-camera environment | |
Arivazhagan | Versatile loitering detection based on non-verbal cues using dense trajectory descriptors | |
Naikal et al. | Joint detection and recognition of human actions in wireless surveillance camera networks | |
Silva | Human action recognition based on spatiotemporal features from videos | |
Huang et al. | Learning-based Human Fall Detection using RGB-D cameras. | |
CN113743339B (en) | Indoor falling detection method and system based on scene recognition | |
Al-obaidi | Privacy aware human action recognition: an exploration of temporal salience modelling and neuromorphic vision sensing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ROBERT BOSCH GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NAIKAL, NIKHIL;LAJEVARDI, PEDRAM;SIGNING DATES FROM 20140512 TO 20140610;REEL/FRAME:033375/0469 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |