Data Type: Text Text Type: Journalistic (newswire services) Domain: International news Languages: French (France), German (Germany), Portuguese (Brazil) General Description: The Agence France Presse (AFP) newswire service provides articles in six languages (French, German, Spanish, Portuguese, Arabic and English), which are supplied on six separate data streams collected via a Dateno MKII satellite receiver and associated equipment at the Linguistic Data Consortium of the University of Pennsylvania. The AFP text data included in this corpus (French, German and Portuguese), were processed by Henry Thompson of HCRC at the University of Edinburgh. At least one of the streams, the Portuguese language news service, actually includes some data in Spanish. Henry Thompson, who developed the software to transform the AFP data from transmission format to SGML/Latin1 format, incorporated a rudimentary check of language content into the process, and has applied an SGML tagging approach to identify the language being used on an article-by-article basis. On the basis of this tagging, the Spanish articles have been filtered out of the collection presented on this CD-ROM. However, it is possible that the language identification logic may have erred in some circumstances, leading to the mistaken inclusion of some Spanish text data. In addition to the language identification problem in the AFP Portuguese collection, there was a general difficulty associated with all AFP data involving intermittent transmission noise in each of the data streams, resulting in corruption of the text content. Many of the symptoms associated with this corruption were identified and eliminated from the collection, but some forms of corruption may have gone undetected, such as random loss of characters from the stream or garbling of portions within articles, yielding "printable" but nonsensical content. We are reasonably confident that these less detectable forms of corruption typically occurred in combination with the identifiable symptoms, so that having filtered out those symptoms, most if not all the data corruption has been removed. Useful WWW Links: For more information open URL Availability: CD-ROM Related Corpora: Institution of Origin: Linguistic Data Consortium, University of Pennsylvania, Phil., PA 19104 Publisher and Place of Publication: Agence France Presse 13 place de la Bourse 75002 Paris, France Collection Time Span: 1993-1996 File organization: one file per day. Due to occasional reception problems, files may occasionally contain several days of material, shrinking or replacing files from nearby dates. Also, the "day" does not always start precisely at midnight. The TRAILER fields should indicate transmission time fairly reliably, however. Total Size (compressed): 212MB French, 122MB German, 51MB Portuguese Tagging Description:The SGML DTD file 'ldcnewsw.dtd', which is referenced at the beginning of each AFP data file, is provided in this "doc" directory. A sample AFP article, one from each language, is given below, to illustrate the arrangement and content of the SGML markup. ------------ sample of AFP French ------------ o0018 Vietnam-Japon Le Japon accorde une aide de 30 millions de dollars au Vietnam

HANOI, 3 jan (AFP) - Le Japon vient d'accorder une aide non-remboursable de 3 milliards de yens (3 millions de dollars) destinée à la restructuration économique et à la réduction des difficultés budgétaires du Vietnam, a annoncé mardi "Le Courrier du Vietnam".

Cette aide fait partie du programme d'assistance officielle au développement (ODA) du gouvernement japonais au Vietnam pour l'exercice budgétaire 1994, sous le contrôle de l'Office international de coopération du Japon (OICJ), a ajouté le journal.

Le Japon est de loin le premier pays donateur du régime communiste de Hanoï, avec 529 millions de dollars d'aide au développement débloqués à la suite d'une visite officielle au Vietnam du Premier ministre Tomiichi Murayama en août dernier.

Le montant sera réservé à l'achat de carburant, de lubrifiants, d'engrais, de coton, de produits chimiques, de produits en plastique, d'acier, de papier, de chambres à air, de pneus, de camions et d'autobus, a précisé le quotidien francophone de l'agence vietnamienne d'information (AVI).

ltl/ab tp.pas

AFP 0503 GMT 95/01/03 ------------ sample of AFP German ------------ 0030 Nahost/Israel Mehr Anträge auf israelische Staatsbürgerschaft in Ostjerusalem - 400 Anfragen von Palästinensern in drei Monaten

Jerusalem, 3. Januar (AFP) - Die Zahl der Palästinenser in Ostjerusalem, die einen Antrag auf die israelische Staatsbürgerschaft stellen, ist in den vergangenen Monaten stark angestiegen. Eine Sprecherin des israelischen Innenministeriums sagte am Montag abend, von Oktober bis Dezember hätten 400 Palästinenser die Einbürgerung beantragt. Von 1988 bis 1993 seien insgesamt nur 2500 Bitten auf Einbürgerung geäußert worden. Der Großteil der Anträge werde bewilligt. Der Ostteil Jerusalems war 1967 von Israel annektiert worden. Ein Großteil der 160.000 Palästinenser in Ostjerusalem verfügt über einen jordanischen Paß, alle besitzen eine israelische Aufenthaltskarte, die den Bezug von sozialen Leistungen und die Stimmabgabe bei Kommunalwahlen ermöglicht. Die israelische Zeitung "Haaretz" nannte als einen Grund für die Antragsflug, es sei schwieriger geworden, die jordanischen Pässe zu verlängern.


AFP 0602 GMT 95/01/03 ------------ sample of AFP Portuguese ------------ 0010 09-26 Japão-Bolsa-dólar NIKKEI E DÓLAR EM BAIXA

TÓQUIO, 26 set (AFP) - A Bolsa de Valores de Tóquio operou esta segunda-feira em um clima tranqüilo, já que muitos investidores permaneceram passivos antes do fechamento das contas semestrais, na próxima sexta-feira, dia 30.

O Nikkei caiu 19,31 pontos (- 0,1 %) e estabeleceu-se em 19.814,36 pontos, depois de ter baixado 51,71 pontos durante o pregão de quinta-feira.

O Topix subiu 0,99 ponto e fechou a 1.585,21, após uma baixa de 1,74 ponto na quinta-feira. Os mercados financeiros permaneceram fechados na sexta-feira, feriado do equinócio de outono.

Foram negociados 250 milhões de títulos, contra 353,7 milhões na quinta-feira.

O dólar fechou cotado a 97,86 ienes, em baixa de 0,08 iene em relação ao fechamento de quinta-feira, e os investidores se mostraram preocupados com os rumores de uma possível intervenção do Banco do Japão para manter o dólar, segundo informaram os corretores.

Também preocupa - segundo estes últimos - o desfecho incerto das negociações americano-japonesas, ao aproximar-se a data de vencimento - 30 de setembro - fixada pelos Estados Unidos para começar a aplicar sanções comerciais ao Japão, caso a questão comercial entre os dois países não seja resolvida.


