easytexminer.data¶
dataset for classification¶
- class easytexminer.data.cls_dataset.BertClassificationDataset(pretrained_model_name_or_path, data_file, max_seq_length, input_schema, first_sequence, label_name=None, second_sequence=None, label_enumerate_values=None, multi_label=False, *args, **kwargs)[source]¶
- property eval_metrics¶
Returns the evaluation metrics.
- property label_enumerate_values¶
Returns the label enumerate values.
- convert_single_row_to_example(row)[source]¶
Convert sample token to indices.
- Parameters
row -- contains sequence and label.
text_a -- the first sequence in row.
text_b -- the second sequence in row if self.second_sequence is true.
label -- label token if self.label_name is true.
- Returns: sing example
encoding: an example contains token indices.
- class easytexminer.data.cls_dataset.GLUEDataset(pretrained_model_name_or_path, data_file, max_seq_length, task_name, **kwargs)[source]¶
- property eval_metrics¶
- property label_enumerate_values¶
- class easytexminer.data.cls_dataset.CNNClassificationDataset(pretrained_model_name_or_path, data_file, max_seq_length, input_schema, first_sequence, label_name=None, second_sequence=None, label_enumerate_values=None, multi_label=False, *args, **kwargs)[source]¶
- property eval_metrics¶
Returns the evaluation metrics.
- property label_enumerate_values¶
Returns the label enumerate values.
- convert_single_row_to_example(row)[source]¶
Convert sample token to indices.
- Parameters
row -- contains sequence and label.
text_a -- the first sequence in row.
text_b -- the second sequence in row if self.second_sequence is true.
label -- label token if self.label_name is true.
- Returns: sing example
encoding: an example contains token indices.
dataset for sequence labeling¶
- class easytexminer.data.labeling_dataset.InputExample(text_a, text_b=None, label=None, guid=None)[source]¶
A single training/test example for simple sequence classification.
- class easytexminer.data.labeling_dataset.LabelingFeatures(input_ids, input_mask, segment_ids, all_tokens, label_ids, tok_to_orig_index, seq_length=None, guid=None)[source]¶
A single set of features of data for sequence labeling.
- easytexminer.data.labeling_dataset.bert_labeling_convert_example_to_feature(example, tokenizer, max_seq_length, label_map=None)[source]¶
Convert InputExample into InputFeature For sequence labeling task
- Parameters
example (InputExample) -- an input example
tokenizer (BertTokenizer) -- BERT Tokenizer
max_seq_length (int) -- Maximum sequence length while truncating
label_map (dict) -- a map from label_value --> label_idx, "regression" task if it is None else "classification"
- Returns
an input feature
- Return type
feature (InputFeatures)