Annually Release - Data from Lexicon

Following files are derived from lexicon and are needed to be installed to Lvg database first. All these operations are done under "$LVG_Components/PreDataBase/" directory.
The whole set of these data file are stored on "$Lvg_Components/PreDataBase/data/{YEAR}/data/" directory.

  • infl.data
    • $LexBuild/Tools/Lexicon/GenerateInflVars generates $Lexicon/{YEAR}/tables/inflVars.data
    • copy above file to infl.data
    • The format of fields of infl.data is:
      Inflected formCategory (in number)Inflection (in number)EUIBase formCitation Form

  • synonyms.data
    • Copy and update (manually) synonyms.dm from previous year into ./data/{YEAR}/dataOrg/.
    • Run ModifySynonym to change the format of above file and output to ./data/{YEAR}/data/synonyms.data
    • The new format of fields of synonyms.data is:
      baseNpLcbasecategorysynonymcategory

  • acronym.data
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRABR and put it in ./data/{YEAR}/dataOrg/.
    • Run ModifyAcronym to change the format of above file and output to ./data/{YEAR}/data/acronym.data
    • The new format of fields of acronym.data is:
      expNpLcexptypeacrNpLcacr

  • properNoun.data
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRPRP and put it in ./data/{YEAR}/dataOrg/.
    • grep "|noun|proper|" ${LRPRP} | flds 2 | sort -u > ${TAR_DIR}proper
    • Copy proper to ./data/{YEAR}/data/properNoun.data
    • The new format of fields of properNoun.data is:
      proper noun

  • derivation.data
    • Copy and update (manually) following files from previous year into ./data/{YEAR}/dataOrg/.
      1. dm.fct
      2. etc.fct
      3. convers.fct
      4. nomiz.fct
      5. pd.fct
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRNOM and put it in ./data/{YEAR}/dataOrg/.
    • Run ./bin/GetDerivations to get derivations from above files and output to ./data/{YEAR}/data/derivations.data
    • The new format of fields of derivation.data is:
      term 1Category 1term 2Category 2

  • nominalization.data
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRNOM and put it in ./data/{YEAR}/dataOrg/.
    • Run ModifyNominalization to change the format of above file and output to ./data/{YEAR}/data/nominalization.data
    • The new format of fields of nominalization.data is:
      EUI 1term 1Category 1EUI 2 term 2Category 2

Above six files can be generated by the following command:

  • copy $Lexicon/data/{YEAR}/tables/inflVars.data, LRABR, LRPRP to ./data/{YEAR}/dataOrg/ (./bin/LoadLexiconFiles)
  • > cd bin
  • make_acronym.sh
  • make_proper.sh
  • GenerateLexiconFiles

After above 5 files are properly generated, steps described below are then followed:

  • Copy above files to "$LVG_DIR/data/tables/" (./bin/MoveLexiconFiles)
  • Run Analyze* to check max. sizes of all fields
    • java AnalyzeInflection
    • java AnalyzeAcronym
    • java AnalyzeDerivation
    • java AnalyzeProperNoun
    • java AnalyzeSynonym
  • Load these data into Idb and MySql database
    cd $LVG_DIR/loadDb/bin

    • LoadLexiconToMyIdb

    • LoadLexiconToMySql