IBM Books

Image, Audio, and Video Extenders Administration and Programming

Shot detection data structures

Data related to shot detection is stored in structures that are included in the shot detection header file, dmbshot.h. Many of the shot-detection APIs require that you point to one or more of these structures. Some of these structures are used to contain data that the Video Extender uses as input. For example, the shot control structure contains information that controls shot detection. Most of the structures are used by the Video Extender to store data it retrieves from a video. For example, the video frame data structure contains the pixel content of a frame.

The structures used for shot detection are DBvIOType, DBvShotControl, DBvShotType, DBvFrameData, and DBvStoryboardCtrl.

DBvIOType

The DBvIOType data structure contains data about a video, such as its format, dimensions, and number of frames. The data structure is defined as follows:

typedef struct {
 
   FILE *hFile;                    /* file handle for the video */
   char vhandle[255];              /* video handle (if from database)
   char vtable[255];               /* video table name (if from database) */
   char vcolumn[255];              /* video column name (if from database) */
   char vFile[255];                /* name of video file */
   char idxFile[255];              /* name of index file */
   char isIdx;                     /* 1 if the index exists, 0 otherwise */
   char isInDb;                    /* 1 if from DB, 0 if from file */
   int format;                     /* Format of the video */
   unsigned long dx, dy;           /* Dimensions of the video */
   unsigned long totalFrames;      /* TotalFrames in the video */
   unsigned long markFrame;        /* used by shot detection */
   unsigned long currentFrame;     /* The current video frame */
   DBvFrameData fd;                /* Frame data for current frame */
   DBvDCFrameData fdDc;            /* Frame data for DC images */
   unsigned char BGRValid;         /* reserved */
   unsigned short usDeviceID;      /* reserved */
   unsigned long hwnd;             /* reserved */
   int videoReset;                 /* Flag if video is opened or seeked */
   int firstshot;                  /* Used internally to indicate the first call */
   void *reserved                  /* reserved */ 
 
} DBvIOType;

DBvShotControl

The DBvShotControl data structure contains information that is used to control shot detection, such as detection method. The data structure is defined as follows:

typedef struct {
 
  unsigned long reserved;
  unsigned long method;           /* detection method */
 
    #define DETECT_CORRELATION  0x00000001
    #define DETECT_HISTOGRAM    0x00000002
    #define DETECT_CORRHIST     0x00000003
    #define DETECT_CORRHISTDISS 0x00000004
 
 
  int normalCorrValue;            /* Correlation threshold */
  int sceneCutSkipXY;             /* reserved */
  int CorrHistThresh;             /* Histogram threshold */
  int DissThresh;                 /* Dissolve threshold */
  int DissCacheSize;              /* Dissolve cache size */
  int DissNumCaches;              /* Dissolve cache number */
  int minShotSize;                /* Minimum frames in a shot */
 
} DBvShotControl;

The table below describes each field in DBvShotControl and its allowed and default settings. To initialize these fields to default values, use the DBvInitShotControl API as described in Initializing values in shot detection data structures.

DBvShotControl settings depend on the type of video: Scene changes in digitized video vary greatly depending on the content and format of the video. Also, the accuracy of the scene change algorithms varies depending on the video. Clearly defined scene changes with obvious differences in overall frame appearance are detected more accurately than more subtle types of changes, or changes where the overall color content remains the same. Although the default DBvShotControl field settings work well for most applications, you might need to adjust these settings to reduce instances of false or missed detection.

Table 10. DBvShotControl fields
Field Meaning
method Identifies the method that the Video Extender uses to detect a scene change. You can choose one of the following methods:

DETECT_CORRELATION. Compare pixels in two successive frames. If the difference exceeds the correlation threshold, detect a scene change.

DETECT_HISTOGRAM. Compare the histogram values of two successive frames. The histogram value measures the distribution of colors in the frame. If the difference exceeds the histogram threshold, detect a scene change.

DETECT_CORRHIST. Use the correlation method to identify possible scene changes, then use the histogram method for the frames marked as possible scene changes. If the histogram threshold is exceeded, detect a scene change.

DETECT_CORRHISTDISS. Same as for DETECT_CORRHIST, but examine additional frames for dissolves.

The default method is DETECT_CORRHIST.

normalCorrValue An integer value of 0 to 100 that specifies the correlation threshold. This gives the minimum value of the correlation coefficient between pixels in two frames. A value of 0 means always detect a scene change for the next frame. A value of 100 means detect a scene change only if all the pixels change from one frame to the next frame. The default value is 60.
sceneCutSkipXY Reserved.
CorrHistThresh An integer value of 0 to 100 that specifies the histogram threshold. This measures the difference between the histogram values of successive frames. A value of 0 means detect a scene change only if the histogram values are fully different from one frame to the next. A value of 100 means always detect a scene change for the next frame. The default value is 10.
DissThresh An integer value of 0 to 100 that specifies the dissolve test threshold. This measures the percentage of pixels in a frame that must pass a dissolve test for a dissolve to be detected. A value of 0 means always detect a dissolve for the frame. A value of 100 means detect a dissolve only if all pixels in the frame pass the dissolve test. The default value is 15.
DissCacheSize An integer value that specifies the number of frames used in the slope portion of the dissolve test. The default value is 4.
DissNumCaches An integer value that specifies the number of frames used in the consistency portion of the dissolve test. The default value is 7.
minShotSize An integer value that specifies the minimum number of frames for a shot. For a shot to be detected, it must have at least as many frames as the minimum. The default value is 5.

DBvShotType

The DBvShotType data structure contains information about a shot, such as its starting frame number, ending frame number, and representative frame number; and a pointer to the pixel content of the representative frame. The data structure is defined as follows:

typedef struct {
 
  unsigned long startFrame;       /* starting frame number */
  unsigned long endFrame;         /* ending frame number */
  unsigned long repFrame;         /* representative frame number */
  DBvFrameData fd;                /* data for representative shot */
  unsigned long dx;               /* frame data width in pixels */
  unsigned long dy;               /* frame data height in pixels */
  char *comment;                  /* shot remark */
 
} DBvShotType;

DBvFrameData

The DBvFrameData data structure contains the pixel content of a frame. The data structure is defined as follows:

typedef struct                     /* video frame data */
{
  /* MPEG 1 pixels */
  unsigned char *luminance;        /* Luminance pixel plane (black and white) */
  unsigned char *Cr;               /* Cr pixel plane */
  unsigned char *Cb;               /* Cb pixel plane */
  unsigned char *reserved;
 
  } DBvFrameData;

DBvStoryboardCtrl

The DBvStoryboardCtrl data structure contains values that control which, and how many, representative frames for a shot are stored in a video catalog. See Building a storyboard for a description of how these values are used. The data structure is defined as follows:

typedef struct {
 
  int thresh1;                    /* threshold for small to medium scenes */
  int thresh2;                    /* threshold for medium to large scenes */
  int delta;                      /* offset used for representative frames */
 
} DBvStoryboardCtrl;

The table below describes each field in DBvStoryboardCtrl and its default settings. To initialize these fields to default values, use the DBvInitStoryboardCtrl API as described in Initializing values in shot detection data structures.

DBvStoryboardCtrl settings depend on the type of video: Which, and how many, representative frames are optimal for a storyboard might differ for different types of videos. Although the default DBvStoryboardCtrl field settings work well for many types of videos, you might want to use these settings on a test subset of videos. You can then tune the settings as appropriate before building storyboards for a wider set of videos.

Table 11. DBvStoryboardCtrl fields
Field Meaning
thresh1 Identifies the threshold for short shots. Shots having fewer frames than the value of thresh1, are short shots. If cataloged, the information for a short will include one representative frame (the middle frame).

The default value is 90. If the value of thresh1 is set to -1, a shot will be considered a short shot (regardless of its actual length).

thresh2 Identifies the threshold for medium to large shots. Shots having as many as or fewer frames than the value of thresh2, but as least as many frames as the value of thresh1, are considered medium shots. If cataloged, the information for a medium shot will include two representative frames. The position of the representative frames is controlled by the value of the delta field. Shots having more frames than the value of thresh2, are considered long shots. If cataloged, the information for a long shot will include three representative frames. The position of the first and last representative frames is controlled by the value of the delta field. The second frame is the middle frame.

The default value is 150. If the value of thresh2 is set to -1, a shot will be considered a short shot (regardless of its actual length).

delta Identifies the offset used for representative frames. For medium and long shots, the first representative frame is offset from the beginning of the shot by the number of frames in delta. The last representative frame is offset from the end of the shot by the number of frames in delta.

The default value is 5.

Initializing values in shot detection data structures

The values in the DBvShotControl data structure control shot detection. The values in the DBvStoryboardCtrl data structure control the building of a storyboard. You can explicitly specify values for the fields in these data structures. In addition, you can initialize the values in these structures to default values. See Table 10 for the default values in the DBvShotControl data structure. See Table 11 for the default values in the DBvStoryboardCtrl data structure.

Use the DBvInitShotControl API to initialize the values in the DBvShotControl data structure. When you use the API, you need to specify the shot control structure. For example, the following statement initializes the fields in the DBvShotControl structure to default values:

DBvShotControl    shotCtrl;
 
rc=DBvInitShotControl(
              shotCtrl);          /* pointer to shot control structure */
              

Use the DBvInitStoryboardCtrl API to initialize the values in the DBvStoryboardCtrl data structure. When you use the API, you need to specify the storyboard control structure. For example, the following statement initializes the fields in the DBvStoryboardCtrl structure to default values:

DBvStoryboardCtrl    sbCtrl;
 
rc=DBvInitStoryboardCtrl(
              sbCtrl);          /* pointer to storyboard control structure */
              


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]