ULAPI  8.0
Public Member Functions | List of all members
ULLanguageDataSource Class Referenceabstract

ULLanguageDataSource is the abstract parent for classes that interface with single-language data stored somewhere like a .ulc file or a database. More...

#include <ullanguagedatasource.h>

Public Member Functions

virtual ~ULLanguageDataSource ()
 
virtual ULError attach (const ULString &dataSourceIdentifier)=0
 
virtual ULError detach ()=0
 
virtual ULError load ()=0
 
virtual ULError close ()=0
 
virtual ULString getDataSourceIdentifier ()=0
 
virtual ULDataSourceVersion getVersion ()=0
 
virtual const ULLanguagegetLanguage ()=0
 
virtual ULError getWords (const ULString &root, ULList< ULDerivation > &wordList, bool filterResults=false)=0
 
virtual ULError getVerbs (const ULString &infinitive, ULList< ULDerivation > &verbList, bool filterResults=false)=0
 
virtual ULError getNouns (const ULString &text, ULList< ULDerivation > &nounList)=0
 
virtual ULError getVerbModel (uluint32 verbClassID, ULVerbModel &model)=0
 
virtual ULError getMatchingRoots (const ULString &prefix, uluint32 maxMatches, ULList< ULString > &rootList)=0
 
virtual ULError getMatchingNouns (const ULString &prefix, uluint32 maxMatches, ULList< ULString > &nounList)=0
 
virtual ULError getMatchingInfinitives (const ULString &prefix, uluint32 maxMatches, ULList< ULString > &infinitiveList)=0
 
virtual ULError getMatchingInfinitives (const ULString &prefix, uluint32 maxMatches, ULList< ULDerivation > &infinitiveList)=0
 
virtual ULError getVerbFormTypes (const ULDerivation &verb, ULList< ULPartOfSpeech > &verbFormTypes)=0
 
virtual ULError getTenses (const ULDerivation &v, ULList< ULTense > &tenseList, bool includeParticiples=false)=0
 
virtual ULError getTensesForClass (uluint32 classID, ULList< ULTense > &tenseList, bool includeParticiples=false)=0
 
virtual ULError getAllTenses (ULList< ULTense > &tenseList, bool includeParticiples=false)=0
 
virtual ULError getPersons (const ULDerivation &v, ULTense tense, ULList< ULPerson > &personList)=0
 
virtual ULError getPersonsForClass (const ULDerivation &v, uluint32 classID, ULTense tense, ULList< ULPerson > &personList)=0
 
virtual ULError getAllTaggingRules (ULList< ULTaggingRule > &ruleList)=0
 
virtual ULError getFeatureNameList (ULList< ULString > &featureNameList)=0
 
virtual ULError getInflectionRules (const ULDerivation &derivation, const ULPartOfSpeech &targetPartOfSpeech, ULList< ULInflectionRule > &ruleList)=0
 
virtual ULError getInflectionRulesForDissection (const ULDerivation &derivation, ULList< ULInflectionRule > &ruleList)=0
 
virtual ULError getSuccessors (const ULInflectionRule &rule, ULList< ULInflectionRule > &successorList)=0
 
virtual ULError getPredecessors (const ULInflectionRule &rule, ULList< ULInflectionRule > &predecessorList)=0
 
virtual bool hasStopWord (const ULString &word)=0
 
virtual ULError getClosedClassWordForPartOfSpeech (const ULPartOfSpeech &partOfSpeech, ULList< ULDerivation > &wordList)=0
 
virtual ULError getFrequencies (const ULString &word, ULList< ULFrequency > &frequencyList)=0
 
- Public Member Functions inherited from ULDataSource
virtual ~ULDataSource ()
 
- Public Member Functions inherited from ULLockable
 ULLockable ()
 
 ULLockable (const ULLockable &lockable)
 
virtual ~ULLockable ()
 
const ULLockableoperator= (const ULLockable &lockable)
 
void clear ()
 
ULLockgetLock ()
 
void setLock (ULLock *newLock)
 

Additional Inherited Members

- Protected Attributes inherited from ULLockable
ULLocklock
 

Detailed Description

ULLanguageDataSource is the abstract parent for classes that interface with single-language data stored somewhere like a .ulc file or a database.

Warning: If you find yourself thinking about directly using one of the subclasses of this class, you should reconsider. It is much easier to use ULAPI's data sources correctly by working with a ULFactory and the associated higher-level tools such as ULConjugator or ULStemmer, which take care of the initialization and manipulation of the data sources for you.

Constructor & Destructor Documentation

virtual ULLanguageDataSource::~ULLanguageDataSource ( )
inlinevirtual

Member Function Documentation

virtual ULError ULLanguageDataSource::attach ( const ULString dataSourceIdentifier)
pure virtual

Causes this ULDataSource object to be associated with the specified data source, and reads enough information from that data source to determine its language(s), etc.

The exact behavior of attach will be dependent on the nature of the data source. If the data source is a file, then attach will read header information from the file and then close the file to save memory until the data source is actually needed. On the other hand, if the data source is a remote database, then attach might open a connection, collect header information, and then close the connection.

Returns
ULError::NoError if the attachment is successful, or some other ULError value if not.
Parameters
[in]dataSourceIdentifierA string describing the data source (e.g. a file name, a database connection string, a URL, etc.).

Implements ULDataSource.

virtual ULError ULLanguageDataSource::close ( )
pure virtual

Frees dynamically allocated memory associated with this data source while keeping it attached to the file, db, etc. to which it was previously attached. Also closes any relevant files, db connections, etc.

Returns
ULError::NoError if the memory freeing was successful.

Implements ULDataSource.

virtual ULError ULLanguageDataSource::detach ( )
pure virtual

Releases the connection between this ULDataSource object and the data source specified in the previous open() or attach() call, closing any relevant files or network connections and freeing memory in the process.

Returns
ULError::NoError if the attachment is successful, or some other ULError value if not.

Implements ULDataSource.

virtual ULError ULLanguageDataSource::getAllTaggingRules ( ULList< ULTaggingRule > &  ruleList)
pure virtual

Retrieves all the part-of-speech tagging rules used in this data source's language.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[out]ruleListThe list of rules.
virtual ULError ULLanguageDataSource::getAllTenses ( ULList< ULTense > &  tenseList,
bool  includeParticiples = false 
)
pure virtual

Finds all the tenses available for the language associated with this data source.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]includeParticiplesTrue if the tense list should include participles (e.g. infinitive, past participle, present participle, gerund...).
virtual ULError ULLanguageDataSource::getClosedClassWordForPartOfSpeech ( const ULPartOfSpeech partOfSpeech,
ULList< ULDerivation > &  wordList 
)
pure virtual

ULLanguageDataSource objects may contain a list of closed class words with corresponding parts of speech. Typically, a data source will include articles, conjunctions, pronouns, and prepositions.

This method retrieves ULDerivation objects for every closed class word whose part of speech satisfies the category and features in the partOfSpeech parameter. If you want all the closed class words, just set partOfSpeech's category to ULPartOfSpeechCategory::Any, without any features.

Returns
ULError::NoError or ULError::NoMatch, depending on whether there are any matching words or not.
Parameters
[in]partOfSpeechthe part of speech for which we want
virtual ULString ULLanguageDataSource::getDataSourceIdentifier ( )
pure virtual
Returns
the data source identifier for the data source attached to this ULDataSource object, or the empty string if no data source is attached.

Implements ULDataSource.

virtual ULError ULLanguageDataSource::getFeatureNameList ( ULList< ULString > &  featureNameList)
pure virtual

Retrieves the list of all feature names stored in this data source. These feature names will typically include some that refer to global features represented by subclasses of ULEnum (e.g. "pastparticiple"), and others that refer to features used only internally in the language data source.

Returns
ULError::NoMatch if there are no feature names in this data source, or ULError::NoError otherwise.
Parameters
[out]featureNameListthe desired feature names, or the empty list if an error occurs.
virtual ULError ULLanguageDataSource::getFrequencies ( const ULString word,
ULList< ULFrequency > &  frequencyList 
)
pure virtual

ULLanguageDataSource objects may contain frequency data of the form (word, root, part-of-speech, count). These data come from manually tagged corpora similar to the American National Corpus or the Penn Treebank.

This method returns a list of frequency objects corresponding to the specified word. (For example, the word "chairs" might yield ("chairs", "chair", verb, 21), ("chairs", "chair", noun, 623), and ("chairs", "chair", unknown, 2).

The method performs its search in a case-insensitive and accent-insensitive way.

Returns
ULError::NoMatch if there are no frequency records corresponding to the specified word, ULError::DataSourceOpenFailed if there was a problem with the data source, or ULError::NoError otherwise.
Parameters
[in]wordthe word whose frequencies are sought
[out]frequencyListthe corresponding frequencies, sorted in decreasing order of frequency
virtual ULError ULLanguageDataSource::getInflectionRules ( const ULDerivation derivation,
const ULPartOfSpeech targetPartOfSpeech,
ULList< ULInflectionRule > &  ruleList 
)
pure virtual

Retrieves the inflection rules in this data source that might contribute to a successful inflection from the specified derivation to the specified target part of speech. This method is an essential part of the inflection process coordinated by ULInflector.

Returns
ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
Parameters
[in]derivationthe derivation so far, on which we wish to build. This derivation may simply consist of a root word and part of speech, or it may already have some inflection rules to which we're hoping to add.
[in]targetPartOfSpeechthe part of speech towards which the current inflection is being directed.
[out]ruleListthe desired inflection rules (if any), or the empty list if an error occurs.
virtual ULError ULLanguageDataSource::getInflectionRulesForDissection ( const ULDerivation derivation,
ULList< ULInflectionRule > &  ruleList 
)
pure virtual

Retrieves the inflection rules in this data source that might contribute to a successful dissection by being inserted at the front of the specified derivation. This method is an essential part of the dissection/stemming process coordinated by ULDissector.

Returns
ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
Parameters
[in]derivationthe derivation so far, on which we wish to build. This derivation may simply consist of a root word and part of speech, or it may already have some inflection rules in front of which we're hoping to add a rule.
[out]ruleListthe desired inflection rules (if any), or the empty list if an error occurs.
virtual const ULLanguage& ULLanguageDataSource::getLanguage ( )
pure virtual
Returns
the language for which this data source provides data.
virtual ULError ULLanguageDataSource::getMatchingInfinitives ( const ULString prefix,
uluint32  maxMatches,
ULList< ULString > &  infinitiveList 
)
pure virtual

Gets the list of verb infinitives contained in this data source that match (accent- and case-insensitively) the specified prefix. For example, the prefix "spri" might (depending on the specific English data source) yield an infinitive list of "spring", "sprinkle", and "sprint".

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]prefixThe prefix to match.
[in]maxMatchesif non-zero, this is the maximum number of matches to return; if zero, the method returns all matches (which can be a very long list if, for example, prefix is one letter)
[out]infinitiveListThe list of matching infinitives.
virtual ULError ULLanguageDataSource::getMatchingInfinitives ( const ULString prefix,
uluint32  maxMatches,
ULList< ULDerivation > &  infinitiveList 
)
pure virtual
virtual ULError ULLanguageDataSource::getMatchingNouns ( const ULString prefix,
uluint32  maxMatches,
ULList< ULString > &  nounList 
)
pure virtual

Gets the list of nouns contained in this data source that match (accent- and case-insensitively) the specified prefix.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]prefixThe prefix to match.
[in]maxMatchesif non-zero, this is the maximum number of matches to return; if zero, the method returns all matches (which can be a very long list if, for example, prefix is one letter)
[out]nounListThe list of matching nouns.
virtual ULError ULLanguageDataSource::getMatchingRoots ( const ULString prefix,
uluint32  maxMatches,
ULList< ULString > &  rootList 
)
pure virtual

Gets the list of root words contained in this data source that match (accent- and case-insensitively) the specified prefix. For example, the prefix "spri" might (depending on the specific English data source) yield a root list of "spring", "sprinkle", and "sprint" among verbs, and "springy" etc. among adjectives. Typically, only verbs are returned by this way for languages with simple noun and adjective inflection structures. But for languages like Russian, German, and Latin, this method will typically return verbs, nouns, and adjectives.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]prefixThe prefix to match.
[in]maxMatchesif non-zero, this is the maximum number of matches to return; if zero, the method returns all matches (which can be a very long list if, for example, prefix is one letter)
[out]rootListThe list of matching root words.
virtual ULError ULLanguageDataSource::getNouns ( const ULString text,
ULList< ULDerivation > &  nounList 
)
pure virtual

Finds all the nouns in this data source whose root (typically singular) forms match (in an accent- and case-insensitive way) the specified text.

Returns
ULError::NoError, ULError::NoMatch, or any of the error codes related to failure to open or attach to a data source.
Parameters
[in]textthe search string.
[out]nounListthe matching nouns.
virtual ULError ULLanguageDataSource::getPersons ( const ULDerivation v,
ULTense  tense,
ULList< ULPerson > &  personList 
)
pure virtual

Finds all the persons available for the specified verb in the specified tense. For most verb + tense combinations, the list of tenses is the same. Occasionally there are irregular or defective verbs that have a different collection of persons. For example, the French verb "apparoir" ("to be evident") only takes the third person singular in the present tense.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]vThe verb whose persons are desired.
[in]tenseThe tense for which the persons are desired.
[out]personListThe list of persons.
virtual ULError ULLanguageDataSource::getPersonsForClass ( const ULDerivation v,
uluint32  classID,
ULTense  tense,
ULList< ULPerson > &  personList 
)
pure virtual

Finds all the persons available for the specified verb class in the specified tense, assuming temporarily that the verb falls in the specified verb model class. For most verb + tense combinations, the list of tenses is the same. Occasionally there are irregular or defective verbs that have a different collection of persons. For example, the French verb "apparoir" ("to be evident") only takes the third person singular in the present tense.

In most cases, the classID parameter is redundant, because it is equal to v.getClassID(). But during the development of new conjugators, Ultralingua's data editors need to be able to try out different verb classes for each new verb to help them classify the verb correctly. In general, if you find yourself using this method, you should switch to getPersons, which does not have the classID parameter.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]vThe verb whose persons are desired.
[in]classIDThe verb model class ID in which this verb is to be interpreted.
[in]tenseThe tense for which the persons are desired.
[out]personListThe list of persons.
virtual ULError ULLanguageDataSource::getPredecessors ( const ULInflectionRule rule,
ULList< ULInflectionRule > &  predecessorList 
)
pure virtual

Retrieves the list of predecessor rules for the specified inflection rule.

Returns
ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
Parameters
[in]rulethe inflection rule whose predecessors are desired.
[out]ruleListthe desired inflection rules (if any), or the empty list if an error occurs.
virtual ULError ULLanguageDataSource::getSuccessors ( const ULInflectionRule rule,
ULList< ULInflectionRule > &  successorList 
)
pure virtual

Retrieves the list of successor rules for the specified inflection rule.

Returns
ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
Parameters
[in]rulethe inflection rule whose successors are desired.
[out]ruleListthe desired inflection rules (if any), or the empty list if an error occurs.
virtual ULError ULLanguageDataSource::getTenses ( const ULDerivation v,
ULList< ULTense > &  tenseList,
bool  includeParticiples = false 
)
pure virtual

Finds all the tenses available for the specified verb. For most verbs in a given language, the list of tenses is the same. Occasionally there are irregular or defective verbs that have a different collection of tenses.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]vThe verb whose tenses are desired.
[out]tenseListThe tenses.
[in]includeParticiplesTrue if the tense list should include participles (e.g. infinitive, past participle, present participle, gerund...).
virtual ULError ULLanguageDataSource::getTensesForClass ( uluint32  classID,
ULList< ULTense > &  tenseList,
bool  includeParticiples = false 
)
pure virtual

Finds all the tenses available for the specified verb class. For most verb classes in a given language, the list of tenses is the same. Occasionally there are irregular or defective verb classes that have a different collection of tenses.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]classIDThe class ID whose tenses are desired.
[out]tenseListThe tenses.
[in]includeParticiplesTrue if the tense list should include participles (e.g. infinitive, past participle, present participle, gerund...).
virtual ULError ULLanguageDataSource::getVerbFormTypes ( const ULDerivation verb,
ULList< ULPartOfSpeech > &  verbFormTypes 
)
pure virtual

Finds the combinations of tense, number, person, and any other relevant features that are used to specify a particular conjugated form for the specified verb. Though each type is stored in a ULPartOfSpeech object, that object should be read as representing something like "future perfect tense first person plural".

For most verbs in a given language, the list of allowed verb form types is the same. But some verbs are irregular or "defective" and have fewer or different types of conjugated forms. For example, depending on which grammar you consult, the verb "snow" may not have first person forms ("I snow" doesn't really make sense).

The list of form types is sorted by the language's canonical tense ordering, then by number, then by person.

This method should be used instead of the deprecated getTenses and getPersons methods.

Returns
ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
Parameters
[in]verbthe verb whose form types are desired.
[out]verbFormTypesthe admissible form types for the given verb.
virtual ULError ULLanguageDataSource::getVerbModel ( uluint32  verbClassID,
ULVerbModel &  model 
)
pure virtual

Gets the verb model associated with the specified ID.

Returns
ULError::NoError if the operation succeeds, ULError::InvalidID if the specified verb class ID is invalid, or an error associated with the failure to open or attach to a data source.
Parameters
[in]verbClassIDThe ID of the desired verb model.
[out]modelThe verb model.
virtual ULError ULLanguageDataSource::getVerbs ( const ULString infinitive,
ULList< ULDerivation > &  verbList,
bool  filterResults = false 
)
pure virtual

Finds all the verbs in this data source whose infinitives match (in an accent- and case-insensitive way) the specified infinitive.

Returns
ULError::NoError, ULError::NoMatch, or any of the error codes related to failure to open or attach to a data source.
Parameters
[in]infinitivethe infinitive of the desired verbs.
[out]verbListthe matching verbs.
virtual ULDataSourceVersion ULLanguageDataSource::getVersion ( )
pure virtual
Returns
the ULDataSourceVersion associated with this data source.

Implements ULDataSource.

virtual ULError ULLanguageDataSource::getWords ( const ULString root,
ULList< ULDerivation > &  wordList,
bool  filterResults = false 
)
pure virtual

Finds all the words in this data source whose roots (infinitive for verbs, singular for nouns and adjectives, etc.) match the specified root (in an accent- and case-insensitive way)

Returns
ULError::NoError, ULError::NoMatch, or any of the error codes related to failure to open or attach to a data source.
Parameters
[in]rootThe root form of the desired words.
[out]wordListThe matching words.
virtual bool ULLanguageDataSource::hasStopWord ( const ULString word)
pure virtual

ULLanguageDataSource objects may contain a list of "stop words"–words that are very common, and should be ignored in some search contexts. These words tend to be from closed linguistic classes like articles, pronouns, prepositions, etc.

Returns
true if this language data source's stop word list includes the specified word.
Parameters
[in]wordthe word we're testing.
virtual ULError ULLanguageDataSource::load ( )
pure virtual

Perform one-time opening and loading operations. Normally, such operations are performed lazily, when the data source is first queried. If you would prefer to control the time at which loading is performed, call this method.

Returns
ULError::NoError if the loading was successful.

Implements ULDataSource.


The documentation for this class was generated from the following file: