lsm (1) Страница руководства Mac OS X

Эта страница руководства для версии 10.9 Mac OS X

Если Вы выполняете различную версию Mac OS X, просматриваете документацию локально:

В Терминале, с помощью человека (1) команда

Читать страницы руководства

Страницы руководства предназначаются как справочник для людей, уже понимающих технологию.

Чтобы изучить, как руководство организовано или узнать о синтаксисе команды, прочитайте страницу руководства для страниц справочника (5).
Для получения дополнительной информации об этой технологии, ищите другую документацию в Библиотеке Разработчика Apple.
Для получения общей информации о записи сценариев оболочки, считайте Shell, Пишущий сценарий Учебника для начинающих.

LSM(1) Latent Semantic Mapping LSM(1)

NAME
lsm - Latent Semantic Mapping tool

SYNOPSIS
lsm lsm_command [command_options] map_file [input_files]

DESCRIPTION
The Latent Semantic Mapping framework is a language independent, Unicode based technology that builds
maps and uses them to classify texts into one of a number of categories.

lsm is a tool to create, manipulate, test, and dump Latent Semantic Mapping maps. It is designed to
provide access to a large subset of the functionality of the Latent Semantic Mapping API, mainly for
rapid prototyping and diagnostic purposes, but possibly also for simple shell script based
applications of Latent Semantic Mapping.

COMMANDS
lsm provides a variety of commands (lsm_command in the Synopsis), each of which often has a wealth of
options (see the Command Options below). Command names may be abbreviated to unambiguous prefixes.

lsm create map_file input_files
Create a new LSM map from the specified input_files.

lsm update map_file input_files
Add the specified input_files to an existing LSM map.

lsm evaluate map_file input_files
Classify the specified input_files into the categories of the LSM map.

lsm cluster [--k-means=N | --agglomerative=N] [--apply]
Compute clusters for the map, and, if the --apply option is specified, transform the map
accordingly. Multiple levels of clustering may be applied for faster performance on large maps,
e.g.

lsm cluster --k-means=100 --each --agglomerative=100 --agglomerative=1000 my.map

first computes 100 clusters using (fast) k-means clustering, computes 100 subclusters for each
first stage cluster using agglomerative clustering, and finally reduces those 10000 clusters to
1000 using agglomerative clustering.

lsm dump map_file [input_files]
Without input_files, dumps all words in the map with their counts. With input_files, dump, for
each file, the words that appear in the map, their counts in the map, and their relative
frequencies in the input file.

lsm info map_file
Bypass the Latent Semantic Mapping framework to extract and print information about the file and
perform a number of consistency checks on it. (NOT IMPLEMENTED YET)

COMMAND OPTIONS
This section describes the command_options that are available for the lsm commands. Not all commands
support all of these options; each option is only supported for commands where it makes sense.
However, when a command has one of these options you can count on the same meaning for the option as
in other commands.

--append-categories
Directs the update command to put the data into new categories appended after the existing ones,
instead of adding the data to the existing categories.

--categories count
Directs the evaluate command to only list the top count categories.

--category-delimiter delimiter
Specify the delimiter to be used to between categories in the input_files passed to the create
and update commands.

group Categories are separated by a `;' argument.

file Each input_file represents a separate category. This is the default if the
--category-delimiter option is not given.

line Each line represents a separate category.

string Categories are separated by the specified string.

--clobber
When creating a map, overwrite an existing file at the path, even if it's not an LSM map. By
default, create will only overwrite an existing file if it's believed to be an LSM map, which
guards against frequent operator errors such as:

lsm create /usr/include/*.h

--dimensions dim
Direct the create and update commands to use the given number of dimensions for computing the map
(Defaults to the number of categories). This option is useful to manage the size and
computational overhead of maps with large number of categories.

--discard-counts
Direct the create and update commands to omit the raw word / token counts when writing the map.
This results in a map that is more compact, but cannot be updated any further.

--hash
Direct the create and update commands to write the map in a format that is not human readable
with default file manipulation tools like cat or hexdump. This is useful in applications such as
junk mail filtering, where input data may contain naughty words and where the contents of the map
may tip off spammers what words to avoid.

--help
List an overview of the options available for a command. Available for all commands.

--html
Strip HTML codes from the input_files. Useful for mail and web input. Available for the create,
update, evaluate, and dump commands.

--junk-mail
When parsing the input files, apply heuristics to counteract common methods used by spammers to
disguise incriminating words such as:

Zer0 1nt3rest l0ans Substituting letters with digits
W E A L T H Adding spaces between letters
m.o.r.t.g.a.g.e Adding punctuation between letters

Available for the create, update, evaluate, and dump commands.

--pairs
If specified with the create command when building the map, store counts for pairs of words as
well as the words themselves. This can increase accuracy for certain classes of problems, but
will generate unreasonably large maps unless the vocabulary is fairly limited.

--stop-words stop_word_file
If specified with the create command, stop_word_file is parsed and all words found are excluded
from texts evaluated against the map. This is useful for excluding frequent, semantically
meaningless words.

--sweep-cutoff threshold
--sweep-frequency days
Available for the create and update commands. Every specified number of days (by default 7), scan
the map and remove from it any entries that have been in the map for at least 2 previous scans
and whose total counts are smaller than threshold. threshold defaults to 0, so by default the
map is not scanned.

--text-delimiter delimiter
Specify the delimiter to be used to between texts in the input_files passed to the create,
update, evaluate, and dump commands.

file Each input_file represents a separate text. This is the default if the --text-delimiter
option is not given.

line Each line represents a separate text.

string Texts are separated by the specified string.

--triplets
If specified with the create command when building the map, store counts for triplets and pairs
of words as well as the words themselves. This can increase accuracy for certain classes of
problems, but will generate unreasonably large maps unless the vocabulary is fairly limited.

--weight weight
Scale counts of input words for the create and update commands by the specified weight, which may
be a positive or negative floating point number.

--words
Directs the evaluate or cluster commands to apply to words, instead of categories.

--words=count
Directs the evaluate command to list the top count words, instead of categories.

EXAMPLES
"lsm evaluate --html --junk-mail ~/Library/Mail/V2/MailData/LSMMap2 msg*.txt"
Simulate the Mail.app junk mail filter by evaluating the specified files (assumed to each hold
the raw text of one mail message) against the user's junk mail map.

"lsm dump ~/Library/Mail/V2/MailData/LSMMap2"
Dump the words accumulated in the junk mail map and their counts.

"lsm create --category-delimiter=group c_vs_h *.c ';' *.h"
Create an LSM map trained to distinguish C header files from C source files.

"lsm update --weight 2.0 --cat=group c_vs_h ';' ../xy/*.h"
Add some additional header files with an increased weight to the training.

"lsm create --help"
List the options available for the lsm create command.

1.0 2011-11-07 LSM(1)

Сообщение о проблемах

Способ сообщить о проблеме с этой страницей руководства зависит от типа проблемы:

Ошибки содержания: Ошибки отчета в содержании этой документации со ссылками на отзыв ниже.
Отчеты об ошибках: Сообщите об ошибках в функциональности описанного инструмента или API через Генератор отчетов Ошибки.
Форматирование проблем: Отчет, форматирующий ошибки в интерактивной версии этих страниц со ссылками на отзыв ниже.