TRECVID 2004 Keyframes & Transcripts

Item Name: TRECVID 2004 Keyframes & Transcripts
Author(s): Paul Over, Georges Quenot, Kevin Walker
LDC Catalog No.: LDC2010V01
ISBN: 1-58563-551-0
ISLRN: 778-679-274-442-1
Release Date: June 16, 2010
Member Year(s): 2010
DCMI Type(s): MovingImage
Data Source(s): broadcast news
Project(s): TREC
Application(s): content-based retrieval, event detection, information extraction
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2010V01 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Over, Paul, Georges Quenot, and Kevin Walker. TRECVID 2004 Keyframes & Transcripts LDC2010V01. Web Download. Philadelphia: Linguistic Data Consortium, 2010.
Related Works: View


TRECVID 2004 Keyframes and Transcripts was developed as a collaborative effort among researchers at the Linguistic Data Consortium (LDC), NIST, LIMSI-CNRS, and Dublin City University.

TREC Video Retrieval Evaluation (TRECVID) was sponsored by the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The keyframes in this release were extracted for use in the NIST TRECVID 2004 Evaluation.

TRECVID was a laboratory-style evaluation that attempted to model real world situations or significant component tasks involved in such situations. In 2004 there were four main tasks with associated tests:

  • shot boundary determination
  • story segmentation
  • high-level feature extraction
  • search (interactive and manual)

For a detailed description of the TRECVID Evaluation Tasks, please refer to the NIST TRECVID 2004 Evaluation Description.


The source data includes approximately 70 hours of English language broadcast programming collected by LDC in 1998 from ABC ("World News Tonight") and CNN ("CNN Headline News").

Shots are fundamental units of video, useful for higher-level processing. To create the master list of shots, the video was segmented. The results of this pass are called subshots. Because the master shot reference is designed for use in manual assessment, a second pass over the segmentation was made to create the master shots of at least 2 seconds in length. These master shots were the ones used in submitting results for the feature and search tasks in the evaluation. In the second pass, starting at the beginning of each file, the subshots were aggregated, if necessary, until the current shot was at least 2 seconds in duration, at which point the aggregation began anew with the next subshot.

The keyframes were selected by going to the middle frame of the shot boundary, then parsing left and right of that frame to locate the nearest I-Frame. This then became the keyframe and was extracted. Keyframes have been provided at both the subshot (NRKF) and master shot (RKF) levels.

In a small number of cases (all of them subshots) there was no I-Frame within the subshot boundaries. When this occurred, the middle frame was selected. There is one anomaly: at the end of the first video in the test collection, a subshot occurs outside a master shot.)

The emphasis in the common shot boundary reference is on the shots, not the transitions. The shots are contiguous. There are no gaps between them. They do not overlap. The media time format is based on the Gregorian day time (ISO 8601) norm. Fractions are defined by counting pre-specified fractions of a second.


Samples of data available in this corpus: Keyframe (video still) Shots metadata (mp7 markup) Subshot metadata Transcript Tokenized transcript


No updates are available at this time.

Available Media

View Fees

Login for the applicable fee