# # COMPONENT_NAME: austext # # FUNCTIONS: none # # ORIGINS: 27 # # (C) COPYRIGHT International Business Machines Corp. 1993,1996 # All Rights Reserved # Licensed Materials - Property of IBM # US Government Users Restricted Rights - Use, duplication or # disclosure restricted by GSA ADP Schedule Contract with IBM Corp. # #***************** ENG.SFX ******************* # $XConsortium: eng.sfx /main/3 1996/10/29 20:12:24 cde-ibm $ # Paice Stemmer Suffix Removal Rules, Ascii English # July 1993. # File Format: # One rule per line. # Empty lines and lines beginning with punctuation are comments. # Lines must be sorted lexicographically by FIRST CHAR only ('A' - 'Z'). # Within a char section, rules sorted sequentially as applied. # Token #1: Required, UPPERCASE suffix string, reading backwards. # Token #2: Optional, single asterisk (*). Rule is applied only # if original word "is intact", ie this is first rule applied. # Token #3: Required, 'remove' count. How much of suffix to remove. # Zero is permissable and terminates stemming. # Token #4: Optional, append string, reading correctly. Applied # after suffix is removed. # Token #5: Required, continuation symbol '>' or '$'. # If '$', stemming terminates, else continues. # # $Log$ # Revision 2.3 1996/02/01 19:02:05 miker # Restored some rules inadvertently deleted. # # Revision 2.2 1996/02/01 18:50:18 miker # AusText 2.1.11, DtSearch 0.3: Changed .sfx format so certain # values are not hardcoded in lang.c. # AI * 2 $ A * 1 $ BB 1 $ CITY 3 S $ CI 2 > CN 1 T > DD 1 $ DEI 3 Y > DEEC 2 SS $ DEE 1 $ DE 2 > DOOH 4 > E 1 > FEIL 1 V $ FI 2 > GNI 3 > GAI 3 Y $ GANAM 0 $ GA 2 > GG 1 $ HT * 2 $ HSIUG 5 CT $ HSI 3 > I * 1 $ I 1 Y > JI 1 D $ JUF 1 S $ JU 1 D $ JO 1 D $ JEH 1 R $ JREV 1 T $ JSIM 2 T $ JN 1 D $ J 1 S $ LBAIFI 6 $ LBAI 4 Y $ LBA 3 > LBI 3 $ LIB 2 L > LC 1 $ LUFI 4 Y $ LUF 3 > LU 2 $ LAI 3 > LAU 3 > LA 2 > LL 1 $ MUI 3 $ MU * 2 $ MSI 3 > MM 1 $ NOIS 4 J > NOIX 4 CT $ NOI 3 > NAI 3 > NA 2 > NEE 0 $ NE 2 > NN 1 $ PIHS 4 > PP 1 $ RE 2 > RAE 0 $ RA 2 $ RO 2 > RU 2 > RR 1 $ RT 1 > REI 3 Y > SEI 3 Y > SIS 2 $ SI 2 > SSEN 4 > SS 0 $ SUO 3 > SU * 2 $ S * 1 > S 0 $ TACILP 4 Y $ TA 2 > TNEM 4 > TNE 3 > TNA 3 > TPIR 2 B $ TPRO 2 B $ TCUD 1 $ TPMUS 2 $ TPEC 2 IV $ TULO * 2 OLV $ TSIS 0 $ TSI 3 > TT 1 $ UQI 3 $ UGO 1 $ VIS 3 J > VIE 0 $ VI 2 > YLB 1 > YLI 3 Y > YLP 0 $ YL 2 > YGO 1 $ YHP 1 $ YMO 1 $ YPO 1 $ YTISOR 6 $ YTISO 5 > YTI 3 > YTE 3 > YTL 2 $ YRTSI 5 $ YRA 3 > YRO 3 > YFI 3 $ YCN 2 T > YCA 3 > Y * 1 $ Y 1 $ ZI 2 > ZY 1 S $