Scripts/Stopwords

De Investigació

Dreceres ràpides: navegació, cerca

Remove StopWords

Tested!! This is more efficient than other scripts I've tested. Note: Why using /usr/xpg4/bin/grep instead of just grep??

#!/bin/bash

DOC="$1"
STOPLIST="stopwords.list"
BASENAME=`basename $DOC`

# Scan through the document
for word in `cat $DOC`
do

        # skip '*'
        if [ "$word" = "\*" ]; then
                continue
        fi

        # Look-up current word in stop list
        /usr/xpg4/bin/grep -q -w "$word" $STOPLIST

        # if word is not in stoplist, write it
        if [ $? = 1 ]; then
                echo "$word " >> $BASENAME.stop
        fi
done
Eines de l'usuari