The
objective of the ACE program is to develop automatic content extraction technology
to support automatic processing of source language data. This includes classification, filtering, and
selection based on the language content of the source data, i.e., based on the
meaning conveyed by the data. Thus the
ACE program requires the development of technologies that automatically detect
and characterize this meaning.
Ultimately,
ACE applications will maintain a database of what is happening in the
world. Ideally, this will be in terms
of who is doing what, where, and when.
As information from source language data is accumulated over time, the
database will be updated and maintained.
In this way the database becomes a vehicle for tracking the information
we are interested in. The database
should also maintain pointers into the source data so as to ensure more
detailed examination of the information represented in the database.
The
ACE research objectives are viewed as the detection and characterization of
Entities, Relations, and Events. ACE
Phase 1 begins the technology R&D effort by focusing on entity
detection. This task is being defined
so as to support applications as well as to provide a basis for further
development in extracting relations and events.
The
Entity Detection task requires that selected types of entities mentioned in the
source data be detected, their sense disambiguated, and that selected
attributes of these entities be extracted and merged into a unified
representation for each entity.
Tracking of entities across document boundaries will be deferred until after
the initial phase.
This
document outlines the ACE Phase 1 annotation tasks (Entity Detection and
Tracking, Metonymy Annotation, and Generic/Specific Classification). It is intended to integrate section 6 of the
ACE Pilot Study Task Definition v 2.2, EDT Metonymy Annotation Guidelines v
2.4, and various addenda to both documents into up-to-date annotation
guidelines. Please refer to NIST's ACE
website (www.itl.nist.gov/iaui/894.01/tests/ace/index.htm) for the ACE task
definition and evaluation plan.
An
entity is an object or set of objects in the world. A mention is a reference to an entity. Entities may be referenced by their name, indicated by a common
noun or noun phrase, or represented by a pronoun. For example, the following are several mentions of a single
entity:
Name Mention: Amro
Mousa
عمرو موسی
Nominal Mention: the guy wearing a blue shirt
الرجل ذو القميص الازرق
Pronoun Mentions: he, him
هو / هي / ه / ها / هم
For
Phase 1 of ACE, entities are limited to the following five types:
· Person - Person entities are limited to humans. A person may be a single individual or a group.
· Organization - Organization entities are limited to corporations, agencies, and other groups of people defined by an established organizational structure.
· Facility - Facility entities are limited to buildings and other permanent man-made structures and real estate improvements.
· Location - Location entities are limited to geographical entities such as geographical areas and landmasses, bodies of water, and geological formations.
· GPE (Geo-political Entity) - GPE entities are geographical regions defined by political and/or social groups. A GPE entity subsumes and does not distinguish between a nation, its region, its government, or its people.
We
do not identify mentions of animals or most inanimate objects at this
time.
For
each entity, the annotation records the type of the entity (PER, ORG, GPE, LOC,
or FAC), its class (Generic/Specific), all of the mentions of the entity from
the text (Name, nominal, Pronoun), and the role of those mentions if applicable
(see section 4.1.5.3 GPE Mention Roles).
Only
material between <TEXT> and </TEXT> tags is to be annotated. In newswire documents, material in headlines
and slug sections is not to be tagged.
In broadcast news, only the transcribed speech is to be tagged; added
information, such as that within <TURN> tags or speaker identification
tags, is not to be tagged.
Each
distinct person or set of people mentioned in a document refers to [AM1] an
entity of type person. People may be
specified by name " نبيل اشقر ", occupation " اللحام
", family relation " الوالد
", pronoun " هو " , etc., or by some combination of these. Dead people and human remains are to be
recorded as entities of type person. So
are fictional human characters appearing in movies, TV, books, plays, etc.
There
are number of words that are ambiguous
as to their referent. For example,
nouns, which normally refer to animals or non-humans, can be used to describe
people. If it is clear to the annotator
that the noun refers to a person in a given context, it should be marked as a
person entity.
He is [a real
turkey]
هو اسد في المعركة
[The film
star]
نجم السينماء / النجم
السينمائي
She's known
as [the brain of the family]
هي معروفة انها عقل العائلة
He is [a
harmonic force]
هو القوة التناغمية
4.1.1.1
Saints and other religious figures
Religious titles
such as saint, prophet imam or archangel are to be treated as titles.
مار مارون قديس
الموارنة /الامام / الشيخ / القديس/ سان
References
to "God" will be taken to be the name of this entity for tagging
purposes. If it is used as a descriptor
rather than a name, it will be considered a nominal mention.
name
mention
الايمان بالله
Although he
felt like he was [a god], he... nominal
mention
شعر وكأنه اله
4.1.1.2
Fictional characters, names of animals, and names of fictional animals
Names of fictional
characters are to be tagged; however, character names used as TV show titles
will not be tagged when they refer to the show rather than the character name.
باتمان مسلسل
تلفزيوني شهير
زی آدم وست من
مسلسل باتمان
Names of animals
are not to be tagged, as they do not refer to person entities. The same is true for fictional animals and
non-human characters. These two
examples do not yield mentions.
Morris the
cat
القط شعبان
4.1.1.3
Groups of people
Groups of
people are to be considered an entity of type Person unless the group meets the
requirements of an organization or a GPE described below.
العائلة
النجار / السياسي
/ الجزار
اللغويون عند
الباب
4.1.1.3.1
Ethnic, Religious, and Political Groups
Ethnic
groups, religious groups and political groups are often referenced by the name
of the ethnicity, religion and political party, for example:
الاكراد / الارمن
السنة / الموارنة
الديموقراطيون
Those groups
that have an organizing body are name mentions of the organization. If a mention refers to the members of an
organization in general, we consider the mention to refer to the
organization.
الحزب الديموقراطي يۇيد الانفتاح
يصوم الكاثوليك
كل سنة عن اكل اللحوم قبل العيد الكبير
Democrats is
an organization name because it is used in a context describing the beliefs of
the greater organization of the Democratic Party. When a mention refers to an individual person, as in
محمد من الحزب
التقدمي
or to a
small group of individuals, as in
محمد وحسن كلاهما
من الحزب التقدمي
the mention
is a person nominal and is a mention of the same entity as the person to whom
the phrase is attributed.
Ethnic
groups do not generally have a formal organization associated with them. As a result, we mark these mentions as names
of a person entity.
{[PER-name]
Cuban Catholics} are expecting the Pontiff to preach about the value of
religious freedom, something they're just beginning to experience.
الشيعة اللبنانيون ينتظرون عودة الامام موسی الصدر
This example
would be tagged as a Person-Name.
When ethnic
designation is given to an individual person or a small group of individuals,
the mention is marked as a nominal mention of that person entity.
Joe is
{[PER-nominal] a Cuban Catholic}.
حسين شيعي ]PER-nominal [ من الجنوب
In this example,
the mentions "حسين" and "شيعي من الجنوب" refer to the same
entity.
4.1.1.3.2
Family Names
Family names
are to be tagged as Person.
عائلة كنيدي / بيت شاتيلا / الجرجسيين )not commonly used
in Arabic(
Please note
that the example contains two mentions
of the same entity: one name mention and one nominal mention.
4.1.2
Organizations
Each
organization or set of organizations mentioned in a document gives rise to an
entity of type organization. An
organization must have some formally established association and a persistent,
established existence. Typical examples
are businesses, government units, sports teams, and formally organized music
groups. Industrial sectors are also treated as organizations.
Sets of
people who are not formally organized into a unit are to be treated as a person
entity rather than an organization entity.
It is often difficult to tell the difference between organization
entities and collections of individuals tagged as person entities. Example organization-like nouns which are
not organizations are " العائلة "
and " الطاقم ," الموظفون”
In the latter two cases, although the members
of a company or crew may work together in an organized and even hierarchical
fashion, the groups are not organizations by themselves.
Some words
like "team,"فريق
"delegation" وفد and
"police" الشرطة achieve organizational status only in certain
contexts. "[The home team] flies
to Connecticut to meet the Huskies in Hartford"
عاد الفريق الوطني
من القاهرة حيث فاز علی الزمالك
clearly
refers to a named sports team and is thus taggable as an organization. However, the "[U.N. weapons inspection
team]"
مفتشی الامم المتحدة
is less
permanent and cohesive, and is thus a person entity rather than an
organization. The noun
"police" is a person entity in contexts like "[police]
outnumbered [demonstrators]"
زاد عدد رجال
الشرطة علی عدد المتظاهرين
but an organization entity in "[police
in East Timor] have arrested [two men]."
اعتقلت الشرطة في كراتشي رجلان مشتبه بهما
4.1.2.1
Organization Entities used in Person Contexts
Whenever an
organization takes an action, there are people within or in charge of the
organization that one presumes actually made the decision and then carried it
out. Thus many organization mentions could
be though of as metonymically referring to people within the organization. However, there seems to be little to be
gained in the usual case by thus "reaching inside the organization"
to posit a PER metonymy. It seems
better to adopt the view that organizations can be agentive, and take action on
their own. Only when something in the
context draws particular attention to the people within the organization should
a separate mention of a PER entity be marked.
4.1.2.2
First Person Pronouns Referring to Organizations
First person
plural pronouns are often used by representatives of an organization to refer
to that organization. Pronouns are
often used in this way by reporters representing a broadcasting station and
spokespeople representing organizations.
For example, in our top story, our refers to the broadcasting
organization.
في نشرتنا الساعة
السادسة
In these cases, annotators should mark first
person plural pronouns as ORG mentions, and not as PER mentions.
4.1.3
Locations
Locations
defined on a geographical or astronomical basis which are mentioned in a
document and do not constitute a political entity give rise to location
entities. These include, for example,
المريخ / قمة
جبل افرست / نهر النيل / المجموعه الشمسيه
/ القارات / الشرق الاقصي / وادي الموت
In general,
terrestrial locations must have some two-dimensional extent. Abstract coordinates ("31 S, 22(
W") and positions relative to a GPE or location ("30 degrees miles
east of Mount Fuji") are not themselves entities. Borders, considered as (one-dimensional)
boundaries between two regions, are not entities. Positions distinguished only by the occurrence of an event at
that position ("the scene of the murder", "the site of the rocket
launching") are not entities.
مكان الانفجار
/ الجريمة
4.1.3.1
Sub-parts of Locations and GPEs
Portions of
GPE entities or location entities, such as “ the center of the city” constitute
location entities in their own right.
وسط المدينة/ جنوب شرق آسيا/ ضواحي المدينة
When general
locative phrases like "top," "bottom," "edge,"
"periphery," "center," and "middle" are used to
pinpoint a portion of a markable location, they are markable locations.
قمه - قاع= اطراف
= وسط = داخل = خارج = محيط
محيط البلد
/ وسط المدينة / قمة الجبل
انهم يفضلون الضواحي
للسكن بدلا من وسط المدينة
4.1.3.2
Non-Locations
It is easy
to start interpreting all objects as locations. Every physical object implies a location because the space that
each physical object occupies is the "location" of that object. In addition, our language is full of
location modifiers (which are often prepositional phrases) that pinpoint
objects and activities, and even abstract concepts:
المعطف تحت الكرسي
الارنب خلف الصخرة
لدي فكرة في رأسي
اخذ المعطف من
الخزانة
وضع الكتاب في
مكانه
However,
none of these are taggable location expressions. They do not fall within any of the classes defined above for
taggable locations. The annotator must be careful not to fall down this
slippery slope.
Do not tag compass
points when they serve as adjectives or refer to directions, as in
اتجه الجيش نحو
الشمال
Compass points should only be tagged when
they refer to sections of a region, as in "the far west."
الشرق الاوسط
/ الشرق الادنی
4.1.4
Facilities
A facility is
a large, functional, primarily man-made structure. These include buildings, and similar facilities designed for
human habitation, such as houses, factories, stadiums, office buildings,
gymnasiums, prisons, museums, and space stations; objects of similar size
designed for storage, such as barns, parking garages and airplane hangars;
elements of transportation infrastructure, including streets, highways,
airports, ports, train stations, bridges, and tunnels. Roughly speaking, facilities are artifacts
falling under the domains of architecture and civil engineering.
الجسر / النفق
/ المطار / الطريق العام/ المنازل/ المصانع/ الاستاد/ المباني الاداريه/ الجمنزيوم/
السجون/ المتاحف/ محطات الفضاء/ مخازن الحبوب/ الجراجات/ حظائر الطائرات/ الشوارع/
الطرق السريعه/ المطارات/ محطات القطارات/ الكباري/ الانفاق
Individual
rooms of buildings are facilities, but other portions of buildings, such as
walls, windows, closets, or doors, are not facilities.
الحوائط/ النوافذ/
الحمامات/ الابواب
4.1.4.1
Facility Entities used in Organization Contexts
In some
cases, a facility name is used to refer to an organization (which, typically,
operates the facility) or a set of people (the people employed by that
organization).
1. The
museum is located on Fifth Avenue.
يقع المتحف في
وسط نيو يورك
2. I walked
into the museum.
مشيت الی المتحف
3. Mary
works for the museum.
هي تعمل في المتحف
4. The
museum insisted that the exhibition was not obscene.
اصر المتحف ان
العرض الجديد للصور له اثر ايجابي علی الجمهور
5. The
museum received a gift of $100,000.
تسلم المتحف هبة
مالية
Examples 1
and 2 clearly refer to the museum building.
Examples 3, 4, and 5 refer to the organization housed in or operating
the museum facility. In cases like
this, the annotation will reflect both the facility and organization
entities. Please see the Metonymy
section below for more information.
4.1.5
Geographical/Social/Political Entities (GPE)
Geo-Political
Entities are composite entities comprised of a population, a government, a
physical location, and a nation (or province, state, county, city, etc.). All mentions of these four aspects of a GPEs
will be marked GPE and coreferenced. In
this sentence,
رفض الشعب الفرنسي
زيادة الضرائب
الشعب الفرنسي
The mention
of the population of France is marked GPE, rather than PER.
Explicit
references to the government of a country (state, city, etc.) are to be treated
as references to the same entity evoked by the name of the country. Thus "the United States" and
"the United States Government" are mentions of the same entity.
الولايات المتحدة / حكومة
الولايات المتحدة
On the other
hand, references to a portion of the government ("the
Administration", "the Clinton Administration") are to be treated
as a separate entity (of type organization), even if it may be used in some
cases interchangeably with references to the entire government (compare
"the Clinton Administration signed a treaty" and "the United
States signed a treaty").
الادارة
/ ادارة كلنتون
Sometimes
the names of GPE entities may be used to refer to other things associated with
a region besides the government, people, or aggregate contents of the
region. The most common examples are
sports teams:
New York
defeated Boston 99-97 in overtime.
فاز فريق نيو
يورك علی فريق بوستون
These are to
be recorded as distinct entities, not as mentions of the GPE entity. Thus, in this example, both "New
York" and "Boston" would evoke organization entities.
4.1.5.1 GPE
Clusters to be treated as GPEs
Like GPEs,
clusters of GPEs consist of a populace, a well-defined physical territory, and
in some cases (like Europe) , have an organizing body (the European Union)
associated with it.
اوروبا / الاتحاد
الاوروبي
Because of
their similarities to GPEs, these entities appear in contexts similar to those
of GPEs. For example:
السعودية حالت
دون مشاركة اسرائيل في مؤتمر إعمار افغانستان
For this
annotation task, named geographical entities that are commonly referred to by
those names will be considered GPEs rather than Locations. Following is a non-exhaustive list of
entities that were Locations in the Pilot Study, but should be GPEs for this
task.
Asia,
Europe, Eastern Europe, Western Europe, EU, the Middle East, Palestine,
Southeast Asia, New England, South Africa, all continents.
آسيا / اوروبا / الاتحاد الاوروبي / افريقيا الجنوبية/
اوروبا الشرقيه/ اوروبا الغربيه/ الشرق الاوسط/ فلسطين/ جنوب شرق اسيا/ جنوب افريقيا/
بالاضافه الي جميع القارات
Other, more
incidental clusters of GPEs are still considered Locations. For example, the southern United States is a
Location.
جنوب لبنان /
جنوب الولايات المتحدة
On the other hand coalitions of governments,
as well as the UN, are organizational bodies and should be marked
Organization.
الامم المتحدة
4.1.5.2
Nested Region Names
A series of
nested region names, such as بيروت - لبنان
evokes one entity for each region.
This evokes one entity for the city
and a second one for the state (with mention لبنان)
4.1.5.3 GPE
Mention Roles
Annotators
need to decide for each entity mention in the text which role (Person,
Organization, Location, GPE) the context of that mention invokes. This judgment
typically depends on the relations that the entity enters into.
Organization Role
وقعت فرنسا اتفاقا
مع المانيا
Location
Role
اجتمع رۇساء الدول
الفرانكوفونية في فرنسا امس
In the examples
above, the name "France" refers to a range of concepts. Annotators must select the Role which
matches the function of the GPE mention.
The GPE role
may be used in contexts that highlight the nation (or state or province or
city, etc.) aspect of the GPE entity, as distinct from the government,
populace, and location, but that it may also be used in contexts referring to
an indistinct amalgam of more than one of the aspects of a GPE (government,
population, location, and nation).
GPE Role
(whole nation)
تنتج فرنسا افضل
نبيذ في العالم
GPE Role
(indistinct referent)
اهم تحف فرنسا
Even if more
than one aspect of the entity is invoked by the context, only one role should
be assigned. This usually occurs in the
case of conjoined predicates. For example,
Washington
is preparing for potentially massive demonstrations against the World Bank and
the International Monetary Fund as ministers from those organizations arrive
for Sunday's opening session.
تتهيأ واشنطن للتصدي لمظاهرات عنيفة ضد البنك الدولي و صندوق
النقد الدولي عند وصول وزراء من هاتان المنظمتان
للمؤتمر الافتتاحي يوم الاحد
In the above
example, it is the government of Washington (ORG) that is preparing for the
demonstrations, but ministers will arrive at the location Washington. In these cases, the annotator should assign
a role based on the closest local predicate.
In this example, only the ORG role should be assigned to Washington
because "preparing..." is the local predicate and invokes an ORG
reading.
The
following sections give particular guidelines for frequently encountered cases,
with examples.
GPEs
Modifying People and Artifacts
Pre-modifiers
are inherently vague and difficult to decompose. For this reason, all GPE pre-modifiers of people and artifacts
will be assigned the role GPE.GPE. For
the sake of consistency, the corresponding post-modifiers should also be marked
GPE.GPE. For example, [[GPE.GPE]
French} president should be marked in the same way as president of {[GPE.GPE]
France}.
الرئيس الفرنسي / رئيس فرنسا
More examples of GPEs modifying people
include:
{[GPE.GPE]
Israeli} troops
الجيش الاسرائيلي
{[GPE.GPE]
New York} policemen
شرطة نيو يورك
Prime
Minister of {[GPE.GPE] Britain}
رئيس وزراء بريطانيا
Joe Smith of
{[GPE.GPE] the United States}
وزير الدولة البريطاني
طوني بلير
{[GPE.GPE]
New York} attorney
محامي من نيو
يورك
{[GPE.GPE]
U.S.} Commander-in-Chief
القائد المصري
العظيم
GPEs
modifying artifacts should also be marked GPE.GPE. Common artifacts modified by GPEs include but are not limited to
vehicles, weapons, and flags. Some
examples follow:
{[GPE.GPE]
U.S.} surveillance aircraft
طائرة الاستطلاع
الاميركية
{[GPE.GPE]
Iraqi} flag
العلم العراقي
Activities
Associated with GPEs
Certain
activities are associated with GPEs and therefore invoke a GPE role. For example, in a pro-Iraq rally, Iraq is
assigned a GPE.GPE annotation. A rally
is generally concerned with a nation, rather than exclusively a location or
government.
منعت حكومة مبارك
التظاهرات المؤيدة للعراق ]GPE.GPE [
Military
Activity
Similarly,
military activities like invasions, military strikes, bombings, etc. are
considered to be acts carried out by and directed at entire nations (not
distinguishable from the government, people and location of that nation) and
therefore are associated with GPEs.
Both the aggressors and the victims in these cases are marked
GPE.GPE.
بدأ امس العدوان
الاميركي ]GPE.GPE [ علی العراق ]GPE.GPE
[
Political
Communication and Decision-making
On the other
hand, ORGs are responsible for decisions to take military actions. ORGs are also responsible for political
communication events such as announcements, agreements, statements, denials,
expressions of approval and disapproval, etc.
اتفقت الصين
]GPE.ORG [ . مع الحكومة التايوانية
Political
associations
وزير داخلية جزيرة
مالطا GPE.GPE.
قال ممثل ولاية
اوهايو ]GPE.GPE[ في مجلس الشيوخ
Embedding
GPE names
embedded in mentions of the government have a GPE role.
الحكومة اللبنانيةGPE.GPE.
This annotation conveys the relationship
between nation and government.
Similarly, in cases in which the embedded GPE conveys a political
relationship with the location, the GPE is assigned a GPE role, as in }.
المستوطنات الاسرائيلية
]GPE.GPE[
However, in
cases in which there is only a locative relationship between the GPE and the
LOC, the GPE is assigned a LOC role.
For example, in the heartland of America, America is a GPE.LOC because a
locative relation is conveyed.
صرحت وزيرة الخارجية
الاميركية انها ستلتقي مع وزير الدفاع في بلد ]GPE.LOC[ لم يسمی في الغرب الاميركي ]GPE.LOC[
في منطقة واشنطونGPE.LOC
Athletes,
Sports Teams, and GPEs
Athletes and
teams are associated with GPE.GPEs as in the example of Austria below.
فازت) فلانة
( من النمساGPE.GPE في سباق التزلج
However,
when a GPE name is used as a team name (as in Boston beat Philly), the entity
is marked as a metonymy, with the Literal mention being the city and the
Intended mention being the team.
{[GPE.GPE-Lit]
[ORG-Int] New York} had a shot to win but Chris Childs missed a three.
فاز فريق نيو
يورك ].GPE.GPE-Lit] [ORG-Int[ في مباراة كأس العالم
In addition,
because all GPEs are assigned a role, the Literal GPE mention is assigned a GPE
role.
GPEs
modifying organizations
In cases
where GPEs modify organizations, the organizations are considered to be located
in that GPE. Those GPEs should be
marked GPE.LOC.
اعلنت الشركة
اللبنانيةGPE.LOC عن تعيين جورج قرداحي
Governments
While the
entity type for governments is GPE, the role for governments should always be
GPE.ORG.
سيكون موقف )الحكومة
الروسية ] GPE. GPE ] [GPE. ORG [( حرجا اذا ما
تجاهلت واشنطن متطلباتها
(In that
particular example, Russian would also be marked, so that the full annotation
for that phrase would be {[GPE.ORG] the {[GPE.GPE] Russian} government}, and
the two GPE mentions would be coreferential.)
GPEs and
Government Organizations
GPEs
modifying government organizations, like New York police department
دائرة الشرطة
في نيو يورك / شرطة نيو يورك
قسم الاطفئية في الضفة الغربية / الدفاع المدني
اللبناني
reflect a relationship between the
organizations and the governmental aspect of the GPE, so they are assigned a
GPE.ORG markup.
سمح له ان يستأنف
للمحكمة العليا البلجيكية / ا لمحكمة العليا في بلجيكا
GPEs and
Populations
As stated
above, populations of a GPE are treated as GPE.PER. However, it is sometimes difficult to determine whether a
reference to people is a reference to the population.
يتحمل اليابانيون مسؤولية كبيرة عن حروب القرن الماضي
هذا المثل يخضع لعدة احتمالات كما هو موضح لاحقا
In this
example, the phrase the Japanese may be interpreted as the population of Japan,
or the government of Japan, or the Japanese military, or even some part of the
Japanese population. If the annotator
believes that the phrase in question refers to the population of the GPE, or
most of the population of a GPE, then the annotation should be GPE.PER and the
mention is a name mention. However, if
the annotator believes the phrase refers to a group of people, then PER is the
assigned annotation and the mention is nominal because it does not refer to the
name of a person. Examples:
ينتظر الفلسطينيون ]GPE.PER - name[ يوم التحرير بفارغ الصبر
تعتبر )]GPE.PER - nom [ الاكثرية الامريكية ] GPE.PER - name [( ان الرئيس كلنتون اخطأ بتصرفه
ابتكار فرنسي ]GPE.PER - name[ جديد
يقول بتلر ان
علی العراقيين ] GPE.GPE. name [ السماح للمفتشين بمتابعة عملهم
سترسو السفينة
الحربية میسوري ].GPE.GPE. name[ قرب النصب التذكاري للسفینة الاريزونا
و التي اغرقها اليابانيون
الاتراك ]PER - nom [ الوطنيون
4.2 Mentions
For each
entity, we record and coreference all mentions of the entity. Mentions are names, nominal phases, or
pronominal phrases that refer to or describe the entity. For each mention, we record its full extent
and its head.
Mentions
will frequently be nested; that is, they will contain mentions of other
entities. For example, the phrase
The
president of Ford
رئيس شركة فورد
is a mention
of an entity of type person, and contains the name "Ford", a mention
of an entity of type organization. It is
even possible for a noun phrase to contain an embedded mention of the same
entity. For instance, the phrase
The
historian who taught herself COBOL
المعلمة التي
علمت نفسها استعمال الكمبيوتر
evokes a
person entity with two mentions, the entire phrase and the word
"herself". نفسها
4.2.1
Mention Extent
The extent
of a mention consists of the entire nominal phrase. In case of structures where there is some irresolvable ambiguity
as to the attachment of modifiers, the extent annotated should be the maximal
extent. In the case of a discontinuous
constituent, the extent goes to the end of the constituent, even if that means
including tokens that are not part of the constituent. Thus, in
I met some
people yesterday who love chess.
تعرفت امس علی
اشخاص يحبون لعبة الشطرنج
the extent
of the mention is the entire phrase
[Some people
yesterday who love chess]
[اشخاص يحبون لعبة الشطرنج ]
The extent
includes all the modifiers of a nominal phrase, including prepositional
phrases, relative clauses, appositional phrases, etc. Thus the phrase
Fred Smith,
the noted general
الجنرال العظيم فرد سميث
constitutes
two mentions of one entity.
[Fred Smith,
the noted general]
]الجنرال العظيم
فرد سميث[
[the noted
general]
]الجنرال العظيم[
Generally speaking,
tokens are broken at white space, and each item of punctuation is treated as a
separate character. As a rule, we do
not include punctuation such as commas, periods, and quotation marks in the
extent of a mention unless words included within the extent continue on after
the punctuation mark. Extents must
begin at the beginning of a token and end at the end of a token.
4.2.2
Mention Head
In addition
to the extent of the nominal phrase, the head of the phrase must be marked. In
The
hurricane destroyed [the new glass-clad skyscraper].
دمر الاعصار ]ناطحة السحاب الشامخة الجديدة[
the full
mention is
The new
glass-clad skyscraper
and the head
is skyscraper ناطحة Except for proper nouns and adjectives, the
head is always a single token. If the syntactic head of the phrase is a
multi-token item, the first token is marked.
If the head is a proper name, however, then the whole extent of the name
is considered to be the head. In the following examples, the mention is
enclosed in brackets and the head is un derlined:
[Fred Smith]
became [the new prime minister].
اصبح ]فرد سميث[
] رئيسا للوزراء[ في الحكومة الجديدة
If the
phrase is "headless", as in the case of a partitive construction, the
last modifier of the empty head is to be marked:
A course in
linguistics for [the young] and [the restless]
مركز رياضي ]للصغار[
و]الكبار[
He was
introduced to [five of the analysts]
تعرف علی خمسة
من المحللين
Note that in
the last example, there is a second entity, whose full mention and head
is [analysts] [المحللين]
4.2.3
Markability
4.2.3.1
Plurals
A plural can
be an entity:
The injured
passengers
المسافرون الجرحی
Two distinct
sets produce separate entities, regardless of whether they have elements in
common; so, for example,
Ten
passengers were injured, six seriously
جرح عشرة ركاب ستة منهم بجروح خطرة
evokes two
entities, one for the ten passengers, one for the six. Distinct sets produce separate entities,
even if they have the same string, so
Five people
like vanilla, five people like chocolate
وصل خمسة اشخاص
وغادر اربعة آخرون
evokes two
entities (the five people who like vanilla and the five who like
chocolate). Furthermore, a set is a
distinct entity from each of its members;
Fred Smith
married Harriet Hope; they lived
happily for 6 weeks.
تزوج فرد سميث
وهاريت هوب وعاشا معا سنة كاملة
evokes three
entities, one for Fred Smith, one for Harriet Hope, and one for the set with
members Fred and Harriet. The only
mention of the set is the pronoun ending of the verb
4.2.3.2
Conjunctions
In conjoined
expressions, there should always be one and only one Nominal Entity per head
noun. Thus, conjoined noun phrases with
no elision of the head noun are to be tagged separately. If a pre-nominal modifier is present it gets
included only with the initial noun phrase of the conjunct, and if a
post-nominal modifier is present, it gets included only with the final noun
phrase of the conjunct.
[muslims]
and [croats]
المسلمون والمسيحيون
[many
streams] and [rivers]
انهار كثيرة وجبال
[almost
every serb], [croat] and [muslim in bosnia]
اكثرية الاكراد والاتراك
والارمن في لبنان
[bus
stations], [train stations], and [shopping areas throughout the country]
محطات القطار
والمطارات والاسواق في المدينة
Note: there
is no white space in arabic between “and”
, “و”
and the word immediately after.
4.2.4.3.3
Pronouns Referring to GPEs
Pronouns
that refer to GPEs are marked as mentions of the same entity as their
antecedent, but are assigned the role invoked by the context of the pronoun,
which may not be the same as the role of the antecedent GPE.
Composite
Example: The president flew to
{[GPE.LOC] Israel} to meet with {[GPE.GPE] its} Prime Minister.
غادر الرئيس الی
كندا لمقابلة رئيس وزرائها
Similarly,
in the case of classic metonymies (where two entities are created), pronoun
annotation is determined in part by the link to the antecedent and in part by
the context in which the pronoun appears.
If the antecedent is a classic metonymy, the pronoun will be a mention
of the same entity as either the literal mention or the intended mention of the
antecedent.
Metonymy
Example: Thousands of parochial school and college students are joining this
year's demonstration, including 1,500 high school students from across the
country who spent last night at {[ORG-Literal][FAC-Intended] Catholic
University}. {[FAC] It}'s in
Georgetown.
امضی طلاب الجامعة المتظاهرين
الليلة داخل حرم جامعة الازهر في القاهرة
In some
cases, the antecedent is not a metonymy but the context of the pronoun invokes
an entity with a type that is different from that of the antecedent. In such cases, in addition to the mention of
the new entity, the annotator should also mark the pronoun as a literal mention
of the antecedent entity. (This allows
us to maintain the connection between the pronoun and the antecedent.)
Metonymy
Example: {[FAC] The museum} is located
on 45th Street. {[FAC-Literal]
[ORG-Intended] They} just hired a new guard.
4.2.3.4
Elision
Where elision
of the head noun occurs in a conjunction, a single entity is delineated (these
could also be viewed as conjoined modifier phrases):
[british and
irish governments]
الحكومتان ]البريطانية
والایرلندية[
4.2.3.5
Range Expressions and Elision
Components
of range expressions are tagged separately if there is no elision of any head
noun:
from [the
foothills] to [the prairie]
من ]المحيط[ الی ]الخليج[
from [the
downtown area] to [the suburbs]
من ]وسط المدينة[
الی ]الضواحي[
However, in
examples like the following there is only a single head noun. In these cases we will treat the range
expression as a pre-modifier, so that it gets included in the maximum extent of
the entity:
ranging from
[five to six companies] per day
يتراوح
من ]خمسة الی ستة شركات[ يوميا
from [blue
collar to white collar workers]
من ] اصغر موظف
الی اكبرهم[
4.2.3.6
Predicate complements
Mentions
should include nominal predicate complements that are affirmatively asserted of
a reportable entity, since they describe the entity. Thus
Fred is a
real linguist.
رامز لغوي شهیر
evokes an
entity of type person with two mentions, "Fred" and "a real
linguist". On the other hand,
Fred is not
a real linguist.
رامز ليس بلغوي
شهیر
evokes two
entities: one of type person with only one mention, "Fred" and one of
type person that is generic with only one mention "a real linguist".
Similarly,
Fred is
studying to be a real linguist.
يريد رامز ان
يصبح لغويا مشهورا
evokes a
specific entity of type person with only one mention, "Fred" and a
generic entity of type person with one mention, "a real linguist",
because the text does not assert that Fred has been, is, or will be a real
linguist.
4.2.3.7
Apposition
Appositional
modifiers are treated like predicate complements: they are recorded as mentions of the head, without regard to the
criteria regarding generic usage. Thus
the phrase
Fred, a real
linguist, knows ten languages, none fluently
اللغوي المشهور
رامز يعرف عشرة لغات
evokes an
entity with mentions "Fred, a real linguist", and "a real
linguist".
المتحدث الرسمي
لوزارة الخارجية حسين متولي
حسين متولي متحدثاً
رسمياً لوزارة الخارجية
4.2.3.8
Proper adjectives
A proper
adjective is to be treated as a name mention of the noun from which it is
derived. Thus, if "France"
and "French" both appear in a single document, they are to be marked
as mentions of the same GPE entity (if only "French" appears in a
document, it evokes a GPE entity). The
adjective is marked as a name mention of the GPE entity.
A noun
indicating a national of a given country is a nominal mention of an entity of
type person. In many cases -
"Iranian", "American", "German", etc. - the same
word is used both as a proper adjective and as the name of a national. When used as an adjective
I love
Iranian caviar for breakfast.
الكافيار الايراني
/ الليرة اللبنانية / النبيذ الفرنسي
it is marked
as a name mention of the GPE entity; when used as a noun ("I met three
Iranians."), it is marked as a nominal mention of a person entity.
تعرفت علی ثلاث
ايرانيين
Similar
rules apply to adjectives derived from names of organizations. Thus, "Republican" in
"Republican platform" is a name mention of an organization entity,
الجمهوري / الحزب
الجمهوري
while in
"That Republican likes macaroni and cheese." it is a nominal mention of
a person entity.
4.2.3.9
Quantified and partitive phrases
A partitive
construction of the form quantifier of
ENP gives rise to two mentions: one for
the entire phrase, and one for the embedded noun phrase ENP that is the object
of "of". If the entire phrase
represents a subset of ENP, these will be mentions of distinct entities. Thus in
three of the
women
ثلاثة من النساء
evokes two
entities, for "the women" and "three of the women".
النساء / ثلاثة
من النساء
Similarly,
some of the
women
بعض النساء
evokes two
entities. On the other hand,
all of the
women
جميع النساء
has two
mentions of one entity: "the
women" and "all of the women" (the same set). This is also the case with the
partitive-like phrase
a team of
five experts
فريق من خمس اختصائيين
since the
team is identical to the set of five experts.
4.2.4 Types
of Mentions
We
distinguish between mentions with a named head (name-mentions), those with a
noun head (nominal or nom-mentions) and those with a pronominal head
(pro-mentions). Mentions with empty
heads ("five of the analysts") are classified as pro-mentions.
4.2.4.1
Names
For each
entity, we record the occurrences of names (if any) used to refer to this
entity in the document. For the
purposes of ACE, a name is a noun phrase headed by a proper noun. Often the proper noun head is also the full
extent of the noun phrase. We record
each occurrence of the name of a given entity.
If a name appears twice, both instances must be recorded.
Names are
atomic. This means that entity names wholly
contained within another name are not annotated. For example, in the following phrase only one entity is
referenced.
The New York
Times
النيو يورك تايمز
This phrase
references the organization of the newspaper.
It does not evoke a separate entity for the city of "New
York".
4.2.4.1.1
Head and Extent of Names
The
following are head and extent rules that are specific to Name mentions.
Definite
articles
When a
definite article is commonly associated with an entity name, it also must be
included in the head of the mention.
The Rolling
Stones
الرولنغ ستونز
الرحبانيه
الثلاثي المرح
Titles and
honorifics
Titles such
as "Mr." And role names such as "President" are not
considered part of a person name.
However, appositives such as "Jr.," "Sr." and
"III" are considered part of a person name.
Mr Harry
Schearer Secretary Robert
Mosbacher John Doe, Jr.
Titles,
honorifics, and determiners are all treated as modifiers, and are included in
the extent of the mention of the person entity.
السيد / الامام
/ الاستاذ / الاسقف / الدكتورة
قداسة البابا
شنوده الثالث
Multi-modifier
Expressions
A
single-name expression containing conjoined modifiers with no elision should be
marked as a single expression.
U.S. Fish
and Wildlife Service
مديرية شركة الكهرباء
العامة اللبنانية
The entire
string is to be treated as the name of the organization.
4.2.4.1.2
Markable Names
The
following are markability rules that apply specifically to name mentions.
Aliases and
Nicknames
Generally,
aliases for entities are to be tagged.
Taggable aliases will include the following forms of entity names:
Acronyms,
formed from the initial letter(s) or syllable(s) of successive or major parts
of a compound term. Note that speech examples
of acronyms may appear in a non-standard format. For example:
IBM
شركة آي بي ام
/ شركة باكتل / شركة اي تي اند تي
PACTEL
_a_t and _t
Nicknames
and other aliases are tagged as names when they are established alternate ways
of referring to an entity; if the annotator does not recognize the status of
the nickname, it may be possible to determine from context whether the nickname
is "established" or not.
The Big
Apple nickname
for New York City
المطربة صباح
; شحرورة الوادي
مصر \ القاهرة
; ام الدنيا
باريس \ عاصمة
النور
Entity Names
that Modify Persons/Titles
Entity names
modifying a person or their title/role are to be tagged.
Microsoft
founder Bill Gates
مؤسس مايكروسوفت
بل جيتس
The U.S.
Vice-President
نائب الرئيس الاميركي
Each of the
examples above gives us two mentions.
Please note that nominal mentions of entities, which modify a person or
their title, are not to be tagged.
company
chairman James Smith
رئيس ادارة الشركة
جيمس سميث
This example
yields only one mention.
"company" الشركة is not tagged.
4.2.4.2
Nominals
For the
purposes of the ACE project, a nominal is a noun phrase headed by a common
noun.
4.2.4.2.1
Nominal Left Modifiers
Nominal
adjectives and non-possessive common nouns directly modifying other nouns are
not markable mentions.
Markable:
I love
{French} food.
احب الاكل }العربي{
Not
Markable:
I love
{prison} food.
احب طعام السجن
4.2.4.3
Pronominals
A pronominal
is a word used as a substitute for a noun phrase. They refer to persons or things that are previously specified or
understood from the context.
Pronominals are marked whenever they reference a salient entity. When used as location pronouns, here and
there are markable. Demonstratives
this, that, these, and those are markable when they stand for a noun and not
markable when the simply modify a noun.
The various forms of he, she, and it are markable.
Here are
examples where the pronoun should be tagged.
هنا / هناك /
هذا / هذه /هؤولاء / التي / الذي
* Northern
Idaho is beautiful in the early summer. Motorcycle tourists love to come {here}
and ride along the snowmelt-rivers.
لبنان بلد جميل
جدا يقصده السواح ويتمتعوا هناك بالطبيعة الخلابة
* {this} is
my grandmother
هذه جدتي
* {those}
are the guys who stole my car.
هوؤلاء هم اللصوص
الذين سرقوا سيارتي
Here are
examples where the pronoun should NOT
be tagged.
هوؤلاء لصوص سرقوا
سيارتي
* The White
House and {its} surrounding area.
The
following are some additional rules that apply to pronominal mentions.
4.2.4.3.1
Headless Mentions
Mentions
with empty heads are classified as pro-mentions.
five of the
analysts
خمسة من المحللين
Please note
that this example also includes the nominal mention [the analysts].
المحللين
4.2.5
Coreference of Mentions
If two
mentions refer to the same underlying entity, we must indicate this by
coreferencing them. In most cases, this
is very straightforward. In an article
we want all mentions of the person to be lumped together in the same
entity and marked with the base type PER.
So, if the following sentences appeared in the same article, we would
want to include all the bold mentions in the person entity.
نفی وزير الدفاع
الايراني ان تكون بلاده قد ضبطت اي مواطن خليجي مشيرا الی الاوضاع مؤكدا في الوقت نفسه ان بلاده
تسيطر علی الحدود
Please note,
however, that we must coreference all mentions that refer to the entity that is
وزير الدفاع. This will include nominal mentions and pronominal mentions
]الهاء في بلاده و الميم في مشيرا و الميم في مؤكدا و الهاء في
بلاده [
5 Metonymy
Metonymy
occurs when a speaker uses a reference to one entity to refer to another entity
(or entities) related to it. For example, in the sentence below Beijing is a
capital city name that is used as a reference to the Chinese government:
Beijing will
not continue sales of anti-ship missiles to Iran.
عاودت بيجنغ بيع الاسلحة الی سوريا
Classic
metonymies make reference to two entities, one explicit and one indirect
reference. Common examples are cases of
capital city names standing in for national governments, as shown above. Other common examples involve facilities and
organizations, which are closely related in that organizations typically have
facilities, and facilities are typically owned and administered by
organizations. Thus when a facility is
mentioned, the organization is sometimes also referenced. So, in the museum announced its new exhibit,
the entity museum is a facility that houses artwork, but in this context it is
the organization running the museum that is doing the announcing.
اعلن المتحف
]FAC] [ORG[ عن معرضه الجديد
اعلن مكتب ]ORG]
[FAC[ وزارة السياحة عن فتح صالة عرض جديدة في الطابق الاول من المبنی
In cases
like this, where both entities are expressed by the same phrase, two entity
mentions should be marked, one for each of the corresponding references. If only one entity is expressed, then only
one entity mention is marked. In the above
example, the annotator would mark mentions of a FAC and an ORG entity for the
museum.
Classic
metonymies are to be annotated with two separate mentions, one for each of the
entities referred to. This naturally means that each of those mentions will
need to be linked appropriately to any other mentions of that entity in the
document. For example, there is a
building (a FAC) called the "Holocaust Memorial Museum" but the name
of this building is also often used to refer to the organization that runs its
business in that building. Thus, in a
sentence like the following, "the museum" would be marked as two
mentions, one associated with the FAC entity and the other associated with the
ORG entity.
But Lerman
also added that {[FAC][ORG] the museum} would not extend Arafat the formal
courtesies that are routine for other world leaders.
If,
elsewhere in the document, a mention of "the museum" occurred in the
context "New windows were ordered for the museum", that mention would
be marked as an additional mention of the same FAC entity referred to above,
but not as an additional mention of the ORG entity.
In cases
like the above, where two mentions are marked on the same text, annotators are
to specify which of the two mentions is the "literal" one and which
the "intended" metonymic one.
The Alembic Workbench will support this by allowing the properties
"literal" or "intended" to be added to mentions. In
examples in these guidelines, the literal mention will always be listed first.
Both the literal and the intended mentions, with the entities underlying them,
will be counted in the scoring.
The
remainder of this section outlines specific annotation guidelines for metonymy
in different contexts.
5.1 Capital
City for Governmental GPE
Cases in
which the capital city is used to refer to the nation's government are marked
as true metonyms. (Because two separate
GPEs are involved, this is not an exception to the general rule that GPEs are
marked as one entity with a role rather than as two entities.)
صرح وزير الدفاع
الاميركي وليم كوهن ان بيجنغ
] GPE.GPE][GPE.ORG[ الغت
بيع اسلحة جديدة لكوريا الشمالية
In this
example there are two mentions covering the word Beijing. The GPE.GPE is a mention of the city Beijing
and the GPE.ORG is a mention of China. The GPE.ORG mention is a mention of the
same China entity that would be referred to by other GPE mentions of
"China" that might be found elsewhere in the document. Also if there were a later mention of the
city of Beijing (for example, Cohen left the city this morning),
ثم غادركوهن المدینة
هذا الصباح
it would be
a GPE.LOC mention of the same Beijing entity referred to by the GPE.GPE mention
in the above example.
5.2
Metonymies Involving ORG Base Entities
There is a
table (see the Pilot Study task definition, Section 6.2.5) that specifies a
"base" type for various kinds of entities. Mentions of entities with
ORG base types like schools, restaurants, or churches are sometimes used to
refer to the organization itself, and sometimes used to refer to the facility
that houses that organization. Every mention of such an entity is to be marked
(at least) as a mention of an entity of its base type. A second mention of a
different type should also be marked if the context invokes a metonymic entity.
Thus a mention whose base type is ORG but that is used in a FAC context will
have mentions of both of those two entities associated with it.
Below are
some examples of ORGs that refer either to a single base type entity, or else
to both a base type and metonymic type entity.
Example 1
Universities
have an ORG base type so both mentions of the university in 1A and 1B invoke an
ORG entity. But 1B also invokes a FAC
entity because it refers to the site.
Lee Jung
Hoon, a political science professor at {[ORG-1] Yonsei University}... (From 9801.162)
استاذ العلوم
السياسية في الجامعة اللبنانية
Thousands of
parochial school and college students are joining this year's demonstration,
including 1,500 high school students from across the country who spent last
night at {[ORG-2] [FAC-3] Catholic University}. (From 9801.175)
امضی المتظاهرون
الليلة داخل حرم الجامعة اللبنانية
Example 2
Embassies
have an ORG base type so both 2A and 2B invoke an ORG entity. But 2A also invokes a FAC entity because
FACs, not ORGs have gates.
...a few
hundred ethnic Albanians laid a black wreath at the gate of {[ORG-4] [FAC-5]
Yugoslavian embassy}. (From APW19980308.0201)
تظاهر الطلاب
امام مبنی ] ORG ] [FAC[ السفارة الاميريكية
"Our Ministry of Defense is working very
hard with {[ORG-6] the U.S. Embassy in Bogota} to get the information
together," Cano said. (From
9801.382)
ارسلت السفارة
الاميريكية انذارا لرعاياها
5.3
Metonymies Involving FAC Base Entities
The same
approach used for ORG entity mentions that refer to an associated FAC should
also be used when a FAC entity mention refers to an associated ORG.
Here are two
examples from the same document (9801.266):
Competing
self-images of victim hood have long prevented Israelis and Arabs from
acknowledging the full weight of each other's historical tragedies, and many
Arab leaders have resisted efforts to lure them to {[FAC-7] the museum} and the
similar Yad Vashem memorial in Jerusalem.
Lerman,
reached at his New Jersey home, said the subject of Arafat and Israel's talks
with the Palestinian Authority still profoundly divided U.S. and world Jewry
and "we believe {[FAC-8] [ORG-9] the museum} should not get involved in a
political dispute where half of the people are for something and half are
against it."
Since
museums have a FAC base type, both examples A and B invoke a FAC entity. But example B also invokes an ORG entity
because it is the organization that should not get involved in the dispute.
Note in the
above examples that FAC mention 7 and FAC mention 8 refer to the same FAC
entity, as shown in the following table of entities and mentions:
Entity
1: {[FAC-7] the museum}, {[FAC-8] the
museum}
Entity
2: {[ORG-9] the museum}
Another
common class of FAC metonymies is found when named buildings are used to refer
to the organizations based there:
It is unlikely
{[FAC] [ORG] the White House} would nominate a successor who did not support
sampling, and equally unlikely Republican leaders would look favorably on such
a candidate.
ليس من المحتمل
ان يسمح البيت الابيض ]ORG ]
[FAC [ للمراسلين الصحفيين ان يطرحوا هذه الاسئلة
5.4 Special
Rule for Offices and Branches
Because the
term "office" is often used to refer to an organization, as in
"the Office of the Attorney General," the base type for offices will
be ORG.
مكتب المدعي العام
ORG
When the
context suggests a reference to the physical entity, the entity should be
marked both ORG and FAC. Examples that
are ambiguous as to whether a facility or an organization is intended should be
marked metonymically, with both an ORG and a FAC mention. Thus in the following example the office is
marked both ORG and FAC because it is unclear whether the context suggests that
the investigators are from the physical office or from the organization.
Investigators
from {[ORG-9] [FAC-10] the Kentucky state fire marshal's office}.
المحقق من مكتب مديرية الدفاع المدني في قطاع غزة
(In that
particular example, Kentucky would also be marked, so that the full annotation
for that phrase would be {[ORG-9] [FAC-10] the {[GPE.ORG] Kentucky} state fire
marshal's office}.)
The same
general guidelines apply to other facility terms like "branches" (as
in the local branch of a bank).
فرع البنك
5.5
Metonymies Involving LOC Base Entities
Entities
whose base type is LOC can also be used in metonymic senses. In the following
example, "the world" has literal type LOC but intended type PER, and
thus is annotated with two separate mentions:
{[LOC] [PER]
The whole world} was watching.
العالم كله ينتظر
مباراة كرة القدم
6 Entity
Class (Generic/Specific)
An entity is
generic when it does not refer to a particular object or particular set of
objects in the world. Every entity must
be designated as either generic or specific.
In some cases this distinction is difficult to make. This section will outline several tests that
will help differentiate between the two classes.
6.1
Definition of Generic and Specific
A given
common noun (girl, motorcycle, bookmark, semantic theory, etc.)
بنت / سيارة
/ نظرية فلسفية
denotes a set of objects, each of which is an
example of the noun in question. In such a system, "boy" would refer
to the set BOY whose membership would be precisely all the boys in the world
(or perhaps: in the Universe).
The manner
in which NPs refer can be easily explained relative to this backdrop:
1. Some NPs
are used to refer to a particular object in the world. The set X (the common
noun's referents) from which that object is drawn has little significance to
the audience, other than to help in the selection of the (particular) object in
question.
These NPs
say something like: there is a specific example of X, one that I have in mind,
that ... and are considered to be non-generic.
(Note that
we will use non-generic and specific interchangeably in the present set of
documents. The former is arguably more appropriate, since the annotation
conventions adopted here tag the feature GENERIC as either true or false, but
we will let the latter serve as form of shorthand notation.)
2. Other NPs
are used to refer to underspecified objects that may be an example of the set
(X) in question, but need not be particular. Here the set X has a greater
degree of significance, since the only constraint on the entity in question is
that it be drawn from that set.
These NPs
say something like:
"Any
member of the set X ..."; or
اي عضو / كل عضو
في المجموعة
"Each
member of the set X ..."
and are
considered to be generic.
In short, a
generic mention is used to refer to any member of the set in question rather
than some particular, identifiable member of that set (which would be picked
out by a Non-generic mention) and a formal definition seems altogether
impossible. As shall soon become clear, we can do little better in providing
this notion with a precise definition.
We have
therefore allowed the above informal (folk) definition --- together with the
following discussion of the phenomena; the subsequent taxonomy of common
generic-denoting mentions; and the concluding short list of (non-deterministic)
tests for the applicability of generic status to a given mention --- to serve
as the basis of our tagging decisions with regard to the attribution of generic
status.
The
(un-)reliability of syntactic or contextual tests here will become clear as the
discussion proceeds --- it is helpful to correspondingly consider each of the
examples which follow as having a (frequently secondary) role in illustrating
this fact, whether or not this expository role is explicitly stated.
6.2 Classes
of Mentions Frequently Associated with Generic Entities
We can make some
loose generalizations about the classes of NPs, which are likely to refer to
generic entities, but it is important to bear in mind the source of our
reluctance to offer such categorical (or syntactic) criteria for the assignment
of generic status to a given NP.
Typically,
generic entities include types of entity, suggested attributes of entities,
hypothetical entities, and generalizations across a set or sets of entities.
6.2.1 A Type
of Entity
{Mammals}
are live bearers.
الحيوانات البرمائية
تسبح في النهر
{Good
students} do all the reading.
الطلاب المجتهدون
هم الناجحون
{Typical
firemen} work hard all their lives in dangerous conditions.
يعمل الاطفائي
في ظروف خطرة
6.2.2 A
Suggested Attribute of an Entity
John seems
to be {a nice person}.
يبدو ان جان رجل
كريم
{Misfits}
are sometimes {the best employees}.
الصحفيون مجتهدون
6.2.3 A
Hypothetical Entity
If {a
person} steps over the line, {they} must be punished.
سوف يعاقب كل
من يخطو عن الخط الاحمر
Aides say
he's plotting a political comeback, even considering a run for president} in
two thousand.
يقال انه سيخوض
معركة الانتخابات للرئاسة
6.2.4 A
Generalization across a Set of Entities
{Outsiders}
think that New Jersey is a different country.
يری الغريب /
المسافر / المهاجر كل الامور بنظرة مختلفة
{Purple
houses} are really ugly.
البيوت البنفسجية
جميلة
Even if the
property or the set underlying the entity in question is extremely constrained
(i.e. such that there are very few possible members), that entity should still
be considered generic.
{People who
drive at night in red cars} are likely to get tickets.
كل من لا يقوم
بواجبه سوف يندم
The police
are looking for {a man who wears green suits and carries a purple briefcase}.
تبحث الشرطة عن
رجل يحمل محفظة خضراء ويرتدي معطف احمر
The first of
these examples falls into the Type of Entity category. The second is a
Hypothetical Entity. The man in the second example may or may not exist (even
though the police are looking for him).
Note that
this mention would not be generic if the context went on to say specific things
about the man wearing green suits. We have seen several examples of this case
above. This is only generic if it is unclear if such a person actually exists.
6.3 Tests
for Generic-hood
6.3.1 Words
that are commonly generic
'anyone',
'most Xs', 'more Xs' tend to be generic, even if the author has someone in
mind.
كل / كل من
/ اكثر / أغلب / بعض / اي
{Anyone who
carries a gun} is dangerous.
اي شخص يحمل مسدس يكون خطرا علی المجتمع
{Most
doctors} are just in it for the money.
اكثر الاطباء
/ المفتشين / المعلمين / السياسيين
6.3.2
Determiners
Generic noun
phrases of the type or bare plurals
(without the article) can be
distinguished using tests such as:
1. These
noun phrases in negated contexts are generic:
I didn't see
{gorillas} here. [generic]
لم اری اسدا هنا
\ لم اری اسودا هنا
I saw
gorillas {a gorilla} here. [specific]
رأيت الاسد
2. These
noun phrases in modal contexts (such as belief, desire, ...) are generic:
I want to
see {gorillas}.
اريد ان اری اسدا
ظننت اني سمعت
اسدا
I thought I
heard {a gorilla}.
3. These
noun phrases in questions are generic:
Have you
seen {a gorilla} walking by?
هل رأيت اسدا
يمر من هنا
Have you
seen {gorillas} wearing hats?
هل ريت اسودا
حمراء
Bare plurals
with individual-level predicates are generic. Individual-level predicates mark
characteristics of individual members of a set, e.g., "birds have
wings" means that each bird has wings. In contrast, stage-level predicates
("Gorillas are wrecking my garden", "Gorillas are
available") can be either generic or non-generic, depending on context.
Thus the
subjects are generic in the following sentences:
{Gorillas}
are intelligent
الاسود ذكية
{Linguists}
know French.
اللغويون يتكلمون
الفرنسية
{Birds} have
wings.
للطيور اجنحة
6.3.3
Positive Assertion Test
This test
applies to predications such as "X is Y" (as in the subsequent
example). If X is specific, then Y will be as well, because Y is positively
asserted of X. Y is assumed to be coreferential with X and therefore specific.
{Joe} is {a
nice guy}.
جو رجل طيب
If X is
generic and Y is positively asserted of X, then Y is also generic.
{Firemen}
are {nice guys}.
رجال الاطفاء
طيبون
This test is
less effective when someone other than the author of the story makes the
positive assertion. This is just an instance of the case in which a modal
context forces a generic reading
Mary says
that {Joe} is a {a nice guy}.
تعتقد ماري ان
جو رجل كريم
This sort of
statement falls into the pattern
person Z
says/said/thought/etc. that X is Y
This only
counts as a positive assertion if Y is not an attribute and person Z is a
trustworthy source of information. This case, however, is the exception rather
than the rule. Most modal contexts are entirely opaque, and the assertions
found inside will not generally hold "in the real world." This means
that even the entities at play in such assertions cannot be reliably anchored
in "reality;" that there is probably not a specific entity in the
world to which the beliefs/desires/assertions of the speaker are linked (via
the embedded proposition within which the mention intimating such an entity is
located). In the case of:
John
believes that a gorilla stole his lunch.
We must
assume that "any gorilla will do" (or, at least, that "it could
be the case that any gorilla will do").
1. Negated
pronouns are generic.
I saw no
one.
لم اری احدا
3. Negated
full NPs can be specific.
Neither
{Joe}, nor {Mary} said anything.
لا جو ولا ماري
قالا شيئا
4. Common
nouns modified by "neither" لا
ولا
and partitives with "neither" can be specific (depending on
coreference) because the negative properties of "neither" have scope
over more than just the NP.
لا هذا ولا ذاك
الشخص
NOTES
SPECIFIC TO ARABIC:
!) foreign names of sports teams, organizations,
newspapers ect are always identified as “team X” or “NYT Newspaper” etc. ,
whereas refernces to arabic organizations usually are given as the name without
explanation. For example,
“فاز الاهلي علی الزمالك حسب مصادر “النهار
2) Numbers
numbers can
be femimine, masculine, dual and multiple plural
ثلاث مرات
/ ثلاثة رجال / ثلاثين
/ ثلاثة عشر/ ثلاث و عشرون / ثلاثون
كتاب
3) Pronouns:
١) personal
pronouns are obvious
2) prounoun
endings (known as dependent personal suffixes)
appear at the end of words (nouns, prepositions, particles, verbs) as
listed here:
ي ني ك ه ها كما هما كم كن هم هن
كتابي my book اخوك your brother
بيتهم their house
منهم from them تحتي under me عليها on her
In the past
tense, pronouns are added to the end of the verb
اخذني he took me كتبوا لي they wrote to
me كلمها he talked to her
In the
present tense, both the beginning of the
verb and the end are added on to, for ex the verb to go:
اذهب تذهب
/ تذهبين يذهب تذهب
نذهب تذهبون يذهبوا
In the
future tense likewise
ساذهب ستذهب /ستذهبين سيذهب ستذهب سنذهب ستذهبون سيذهبوا