Scripts/Stopwords
De Investigació
< Scripts
Remove StopWords
Tested!! This is more efficient than other scripts I've tested. Note: Why using /usr/xpg4/bin/grep instead of just grep??
#!/bin/bash
DOC="$1"
STOPLIST="stopwords.list"
BASENAME=`basename $DOC`
# Scan through the document
for word in `cat $DOC`
do
# skip '*'
if [ "$word" = "\*" ]; then
continue
fi
# Look-up current word in stop list
/usr/xpg4/bin/grep -q -w "$word" $STOPLIST
# if word is not in stoplist, write it
if [ $? = 1 ]; then
echo "$word " >> $BASENAME.stop
fi
done

