easytexminer.model_zoo

bert

class easytexminer.model_zoo.models.bert.modeling_bert.BertConfig(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, pad_token_id=0, gradient_checkpointing=False, position_embedding_type='absolute', use_cache=True, **kwargs)[source]

This is the configuration class to store the configuration of a BertModel or a TFBertModel. It is used to instantiate a BERT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BERT bert-base-uncased architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Parameters
  • vocab_size (int, optional, defaults to 30522) -- Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling BertModel or TFBertModel.

  • hidden_size (int, optional, defaults to 768) -- Dimensionality of the encoder layers and the pooler layer.

  • num_hidden_layers (int, optional, defaults to 12) -- Number of hidden layers in the Transformer encoder.

  • num_attention_heads (int, optional, defaults to 12) -- Number of attention heads for each attention layer in the Transformer encoder.

  • intermediate_size (int, optional, defaults to 3072) -- Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.

  • hidden_act (str or Callable, optional, defaults to "gelu") -- The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

  • hidden_dropout_prob (float, optional, defaults to 0.1) -- The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

  • attention_probs_dropout_prob (float, optional, defaults to 0.1) -- The dropout ratio for the attention probabilities.

  • max_position_embeddings (int, optional, defaults to 512) -- The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

  • type_vocab_size (int, optional, defaults to 2) -- The vocabulary size of the token_type_ids passed when calling BertModel or TFBertModel.

  • initializer_range (float, optional, defaults to 0.02) -- The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

  • layer_norm_eps (float, optional, defaults to 1e-12) -- The epsilon used by the layer normalization layers.

  • gradient_checkpointing (bool, optional, defaults to False) -- If True, use gradient checkpointing to save memory at the expense of slower backward pass.

  • position_embedding_type (str, optional, defaults to "absolute") -- Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).

  • use_cache (bool, optional, defaults to True) -- Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.

Examples:

>>> from transformers import BertModel, BertConfig

>>> # Initializing a BERT bert-base-uncased style configuration
>>> configuration = BertConfig()

>>> # Initializing a model from the bert-base-uncased style configuration
>>> model = BertModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config
model_type: str = 'bert'
class easytexminer.model_zoo.models.bert.modeling_bert.BertPreTrainedModel(config: easytexminer.model_zoo.configuration_utils.PretrainedConfig, *inputs, **kwargs)[source]

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

config_class

alias of easytexminer.model_zoo.models.bert.configuration_bert.BertConfig

load_tf_weights(config, tf_checkpoint_path)

Load tf checkpoints in a pytorch model.

base_model_prefix = 'bert'

cnn

class easytexminer.model_zoo.models.cnn.TextCNNConfig(conv_dim, kernel_sizes, linear_hidden_size, embed_size, vocab_size, sequence_length, **kwargs)[source]

This is the configuration class to store the configuration of a :class:TextCNNClassify`. It is used to instantiate a CNN model according to the specified arguments, defining the model architecture.

Parameters
  • conv_dim (int, optional, defaults to 100) -- The output dimemsion of the convolution layer

  • kernal_sizes (string, optional, defaults to 1,2,3,4) -- Specify the number of convolutional layers and kerval size for each layer.

  • linear_hidden_size (int, optional, defaults to 512) -- number of neurals for fead-forward layers after each convolutional layer

  • embed_size (int, optional, defaults to 300) -- embedding dimension for input tokens

  • vocab_size (int, optional, defaults to 30522) -- Vocabulary size of the CNN model.The defalut setting is to use BERTTokenizer so the vocab size is 30522 for english tasks.

  • sequence_length (int, optional, defaults to 128) -- max sequence length for of the input text

Examples:

>>> from easytexminer.model_zoo.models.cnn import TextCNNConfig
>>> from easytexminer.applications.classification import CNNTextClassify

>>> # Initializing a BERT bert-base-uncased style configuration
>>> configuration = TextCNNConfig()

>>> # Initializing a model from the bert-base-uncased style configuration
>>> model = CNNTextClassify(configuration)
class easytexminer.model_zoo.models.cnn.TextCNNEncoder(config)[source]

This is the abstract class to of cnn encoders

Parameters

( (config) -- obj: TextCNNConfig): The configuration of the TextCNN encoder.

Examples:

>>> from easytexminer.model_zoo.models.cnn import TextCNNConfig, TextCNNEncoder

>>> # Initializing a cnn configuration
>>> configuration = TextCNNConfig()

>>> # Initializing a model from the cnn-en style configuration
>>> model = TextCNNEncoder(configuration)
forward(fact_inputs)[source]