Common README for reference segmentation and key frames. The segmentation was performed using a slightly improved version of the the system used by CLIPS-IMAG for the TREC 2003 shot boundary determination task. I. Shot merging It has been decided that no shot should have a duration of less than 2 second (or 60 frames) so that assessors can confidently evaluate them. In order to satisfy this constraint, short shots have been merged with their neighbors. This means that many shots actually contain subshots (the miss rate appears significantly higher that it is before shot merging) and that there are actually two levels of segmentation. It is very important to understand that, especially for the use of keyframes. The first level of segmentation is the original segmentation as produced by the CLIPS-IMAG system ignoring the 2-second constraint. This segmentation is more accurate than the second level one. The segmentation system also produces shots with gaps between them if they are separated by gradual transitions (i.e. the gradual transitions are kept out of the shots). Consistently, a single key frame is extracted for each first level shot (later called "subshot"). The second level of segmentation is obtained from the first level my merging subshots into second level shots (later called "shots") at the weakest transitions in order to meet the minimal duration constraint. Additionally, these shots are extended if necessary so that they do not have gaps between them even if they are separated by gradual transitions (i.e. the gradual transitions are split and included in the shots). A single keyframe is selected from the subshot keyframes (in case there are actually several subshot merged in this shot) which is the corresponding to the longest subshot. Keyframes from other subshots (if any) are also associated to the the shot but as "non representative" ones while the main one is called "representative". This is quite complicated but the minimum duration appeared really necessary for the assessors and using a single keyframe for composite shots is likely to induce many misses in feature detection or search tasks. In case of composite shots, the rule is that the (composite) shot is positive if and only if at least one subshot is positive. This means that, for keyframe based detection or search, all keyframes associated to a shot should be considered and the detection or search results should be or'ed. In the case of detection or search on the whole shot, considering subshots and or'ing the output could also help. II. File (type)s: II.1 Collection definition: - "collection.xml": video id is linked to the video file name and its use for development or test (there is no file for development use in the TREC 2004 collection actually: both development and test files from the TREC 2003 collection can be used for development in TREC 2004) II.2 Reference segmentation: - "xxx.mp7.xml" : Mpeg-7 output, where "xxx" is the video id instead of the video file name. This file also include the keyframe reference and timing information. These files are in the "shots2004" directory and are in a format identical to the format used in TREC 2003. This is the official TREC 2004 Reference segmentation defining the actual units used for feature detection and search tasks. - "filename.xml" : an alternate description for the reference segmentation along with keyframe reference in a simpler format that additionally contains information about shots before merging.Shot and subshots boundaries as well as keyframe reference are given in frame numbers. These files are in the "subshots2004" directory (they do not include the TREC 2004 shot ids but there is a one to one shot mapping). II.3 Keyframes Keyframe names associated to the "xxx.mp7.xml" description (in the "keyframe" directory) : - TRECVID2004_XXX/shotXXX_YYY_RKF.jpg : "representative" keyframes - TRECVID2004_XXX/shotXXX_YYY_NRKF_Z.jpg : "non representative" keyframes Keyframe names associated to the "filename.xml" description (in the "jpg" directory): - filename/XXXXX.jpg : all keyframes. (There is a single set of keyframes, each keyframe having two names). III. Statistics There are 33367 shots in the TREC 2004 test collection, of which: 23239 contain a single subshot 6947 contain 2 subshots 2013 contain 3 subshots 681 contain 4 subshots 254 contain 5 subshots 116 contain 6 subshots 67 contain 7 subshots 27 contain 8 subshots 9 contain 9 subshots 2 contain 10 subshots 2 contain 11 subshots 3 contain 12 subshots 1 contains 13 subshots 1 contains 14 subshots 2 contain 15 subshots 1 contains 17 subshots 1 contains 18 subshots 1 contains 20 subshots for a total of 48818 subshots (this is relative to the first level of segmentation in which can already contain false positives and negatives). The performance of the used segmenter on TREC 2003 SBD collection is (silence = 1 - recall and noise = 1 - precision): At the first level of segmentation (subshots): silence noise CUT: 0.082 0.087 GRAD: 0.182 0.231 ALL: 0.105 0.122 At the second level of segmentation (shots): silence noise CUT: 0.357 0.037 GRAD: 0.341 0.154 ALL: 0.354 0.067 IV .A few small points that are related to the Mpeg-7 output - The media time format is based on the Gregorian day time (ISO 8601) norm. Fractions are defined by counting pre-sepcified fractions of a second. In our case, the frame rate is 29.97. One fraction of a second is thus specified as "PT1001N30000F". - Trec video id has the format of "XXX" and shot id "shotXXX_YYY". The "XXX" is the sequence number of video onto which the video file name is mapped, this is based on the "collection.xml" file. The "YYY" is the sequence number of the shot. - There are two types of keyframes : one "representative keyframe" (with "RKF" in the subshot name) corresponding to the main (or the single) subshot within the shot, zero, one or more additional "non representative keyframe" (with "NRKF" in the shot name) corresponding to other subshots (if any) within the shot. Subshots within a shots correspond to extracted shots merged together in order to enforce the 2s minimum duration of a shot (see above).