Skip to content

Commit ca24d84

Browse files
committed
Add support for skipping rows according to a callback and update an option's default value
1 parent 33f790b commit ca24d84

File tree

3 files changed

+71
-23
lines changed

3 files changed

+71
-23
lines changed

lib/node_modules/@stdlib/utils/dsv/base/parse/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ The constructor accepts the following `options`:
7979

8080
- **ltrim**: `boolean` indicating whether to trim leading whitepsace from field values. If `false`, the parser does not trim leading whitespace (e.g., `a, b, c` parses as `[ 'a', ' b', ' c' ]`). If `true`, the parser trims leading whitespace (e.g., `a, b, c` parses as `[ 'a', 'b', 'c' ]`). Default: `false`.
8181

82+
- **maxRows**: maximum number of records to process (excluding skipped lines). By default, the maximum number of records is unlimited.
83+
8284
- **newline**: character sequence separating rows. Default: `'\r\n'` (see [RFC 4180][rfc-4180]).
8385

8486
- **onClose**: callback to be invoked upon closing the parser. If a parser has partially processed a record upon close, the callback is invoked with the following arguments:
@@ -92,6 +94,7 @@ The constructor accepts the following `options`:
9294
- **field**: field value.
9395
- **row**: row number (zero-based).
9496
- **col**: field (column) number (zero-based).
97+
- **line**: line number (zero-based).
9598

9699
- **onComment**: callback to be invoked upon processing a commented line. The callback is invoked with the following arguments:
97100

@@ -107,6 +110,7 @@ The constructor accepts the following `options`:
107110
- **record**: an array-like object containing field values. If provided a `rowBuffer`, the `record` argument will be the **same** array-like object for each invocation.
108111
- **row**: row number (zero-based).
109112
- **ncols**: number of fields (columns).
113+
- **line**: line number (zero-based).
110114

111115
If a parser is closed **before** fully processing the last record, the callback is invoked with field data for all fields which have been parsed. Any remaining field data is provided to the `onClose` callback. For example, if a parser has processed two fields and closes while attempting to process a third field, the parser invokes the `onRow` callback with field data for the first two fields and invokes the `onClose` callback with the partially processed data for the third field.
112116

@@ -129,6 +133,17 @@ The constructor accepts the following `options`:
129133

130134
- **skip**: character sequence appearing at the beginning of a row which demarcates that the row content should be parsed as a skipped record. Default: `''`.
131135

136+
- **skipBlankRows**: `boolean` flag indicating whether to skip over rows which are either empty or containing only whitespace. Default: `false`.
137+
138+
- **skipRow**: callback whose return value indicates whether to skip over a row. The callback is invoked with the following arguments:
139+
140+
- **nrows**: number of processed rows (equivalent to the current row number).
141+
- **line**: line number (zero-based).
142+
143+
If the callback returns a truthy value, the parser skips the row; otherwise, the parser attempts to process the row.
144+
145+
Note, however, that, even if the callback returns a falsy value, a row may still be skipped depending on the presence of a `skip` character sequence.
146+
132147
- **strict**: `boolean` flag indicating whether to raise an exception upon encountering invalid DSV. When `false`, instead of throwing an `Error` or invoking the `onError` callback, the parser invokes an `onWarn` callback with an `Error` object specifying the encountered error. Default: `true`.
133148

134149
- **trimComment**: `boolean` flag indicating whether to trim leading whitespace in commented lines. Default: `true`.

lib/node_modules/@stdlib/utils/dsv/base/parse/lib/defaults.js

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,9 @@ function defaults() {
6666
// Flag indicating whether to trim leading whitespace from field values. If `false`, leading whitespace is not trimmed (e.g., `a, b, c` parses as `[ 'a', ' b', ' c' ]`). If `true`, leading whitespace is trimmed (e.g., `a, b, c` parses as `[ 'a', 'b', 'c' ]`).
6767
'ltrim': false,
6868

69+
// Maximum number of records to process.
70+
'maxRows': 1e308,
71+
6972
// Character sequence separating rows.
7073
'newline': '\r\n',
7174

@@ -105,6 +108,12 @@ function defaults() {
105108
// Character sequence appearing at the beginning of a row which demarcates that the row content should be skipped.
106109
'skip': '',
107110

111+
// Flag indicating whether to skip over rows which are either empty or containing only whitespace.
112+
'skipBlankRows': false,
113+
114+
// Callback whose return value indicates whether to skip over a row.
115+
'skipRow': null,
116+
108117
// Flag indicating whether to raise an exception upon encountering invalid DSV.
109118
'strict': true,
110119

lib/node_modules/@stdlib/utils/dsv/base/parse/lib/main.js

Lines changed: 47 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -126,10 +126,7 @@ function Parser( options ) {
126126
this._row = 0;
127127
this._line = 0;
128128

129-
// Initialize the state flag:
130-
this._state = INIT;
131-
132-
// Initialize flags indicating whether we're processing a commented line:
129+
// Initialize flags indicating whether we're processing a commented/skipped line:
133130
this._commented = false;
134131
this._skipped = false;
135132

@@ -142,11 +139,14 @@ function Parser( options ) {
142139
this._doublequote = ( options.doublequote === void 0 ) ? opts.doublequote : options.doublequote;
143140
this._escape = options.escape || opts.escape;
144141
this._ltrim = ( options.ltrim === void 0 ) ? opts.ltrim : options.ltrim;
142+
this._maxRows = ( options.maxRows === void 0 ) ? opts.maxRows : options.maxRows;
145143
this._newline = options.newline || opts.newline;
146144
this._quote = options.quote || opts.quote;
147145
this._quoting = ( options.quoting === void 0 ) ? opts.quoting : options.quoting;
148146
this._rtrim = ( options.rtrim === void 0 ) ? opts.rtrim : options.rtrim;
149147
this._skip = options.skip || opts.skip;
148+
this._skipBlankRows = ( options.skipBlankRows === void 0 ) ? opts.skipBlankRows : options.skipBlankRows;
149+
this._skipRow = options.skipRow || opts.skipRow;
150150
this._strict = ( options.strict === void 0 ) ? opts.strict : options.strict;
151151
this._trimComment = ( options.trimComment === void 0 ) ? opts.trimComment : options.trimComment;
152152
this._whitespace = options.whitespace || opts.whitespace;
@@ -178,6 +178,13 @@ function Parser( options ) {
178178
this._skipLength = this._skip.length;
179179
this._skipLastIndex = this._skipLength - 1;
180180

181+
// Initialize the state flag:
182+
if ( this._skipRow && this._skipRow( 0, 0 ) ) {
183+
this._state = SKIP;
184+
this._skipped = true;
185+
} else {
186+
this._state = INIT;
187+
}
181188
// Initialize state processors...
182189
this._states = states( this ); // NOTE: this should come after all other initialization!
183190

@@ -268,12 +275,15 @@ setReadOnly( Parser.prototype, '_reset', function reset() {
268275
this._qidx = -1;
269276
this._eidx = -1;
270277

271-
// Reset the parser state to attempting to parse the first field of the next record:
272-
this._state = INIT;
273-
274-
// Reset flags for commented lines:
278+
// Reset the parser state...
279+
if ( this._skipRow && this._skipRow( this._row, this._line ) ) {
280+
this._state = SKIP;
281+
this._skipped = true;
282+
} else {
283+
this._state = INIT;
284+
this._skipped = false;
285+
}
275286
this._commented = false;
276-
this._skipped = false;
277287

278288
// Reset the buffer:
279289
this._cursor = -1;
@@ -419,7 +429,8 @@ setReadOnly( Parser.prototype, '_onField', function onField() {
419429
this._setField( v, this._col );
420430

421431
// Invoke a callback for receiving field values:
422-
this._onColumn( v, this._row, this._col );
432+
this._onColumn( v, this._row, this._col, this._line );
433+
debug( 'Field. Line: %d. Row: %d. Column: %d. Value: %s', this._line, this._row, this._col, v );
423434

424435
// Increment the field counter to record that we've moved on to the next field:
425436
this._col += 1;
@@ -431,7 +442,6 @@ setReadOnly( Parser.prototype, '_onField', function onField() {
431442
this._qidx = -1;
432443
this._eidx = -1;
433444

434-
debug( 'New field. Line: %d. Field: %d. Value: %s', this._line+1, this._col, v );
435445
return this;
436446
});
437447

@@ -445,28 +455,38 @@ setReadOnly( Parser.prototype, '_onField', function onField() {
445455
* @returns {Parser} parser instance
446456
*/
447457
setReadOnly( Parser.prototype, '_onRecord', function onRecord() {
448-
// FIXME: check for whether `onEmpty` should be invoked and/or handle blank lines
458+
var v;
449459

450460
// Extract the field value:
451-
var v = this._getField( this._cidx, this._cursor );
461+
v = this._getField( this._cidx, this._cursor );
462+
463+
// Check for a blank row (i.e., a row consisting only of whitespace):
464+
if ( this._skipBlankRows ) {
465+
// FIXME: check of whether this is col 0 and whether it consists of only whitespace
466+
}
452467

453468
// Insert the field value into the row buffer:
454469
this._setField( v, this._col );
455470

456471
// Invoke a callback for receiving field values:
457-
this._onColumn( v, this._row, this._col );
472+
this._onColumn( v, this._row, this._col, this._line );
458473
this._col += 1;
459474

460475
// Invoke a callback for receiving rows:
461-
this._onRow( this._getRow(), this._row, this._col );
476+
this._onRow( this._getRow(), this._row, this._col, this._line );
477+
debug( 'Record. Line: %d. Fields: %d.', this._line, this._col );
478+
479+
// Increment row and line counters to indicate that we've moved on to the next row/line:
462480
this._row += 1;
463481
this._line += 1;
464482

465-
debug( 'New record. Line: %d. Fields: %d.', this._line, this._col );
466-
467483
// Reset the parser:
468484
this._reset();
469485

486+
// Check whether we have processed a desired number of rows...
487+
if ( this._row >= this._maxRows ) {
488+
this._changeState( CLOSED );
489+
}
470490
return this;
471491
});
472492

@@ -489,7 +509,9 @@ setReadOnly( Parser.prototype, '_onCommentedRow', function onCommentedRow() {
489509
// FIXME: trim comment; this may be better done character by character
490510
}
491511
this._onComment( v, this._line );
492-
debug( 'New comment. Line: %d. Value: %s', this._line, v );
512+
debug( 'Comment. Line: %d. Value: %s', this._line, v );
513+
} else {
514+
debug( 'Comment. Line: %d.', this._line );
493515
}
494516
// Increment the counter for how many lines have been processed:
495517
this._line += 1;
@@ -516,7 +538,9 @@ setReadOnly( Parser.prototype, '_onSkippedRow', function onSkippedRow() {
516538
if ( this._onSkip ) {
517539
v = this._buffer.slice( 0, this._cursor+1 ).join( '' );
518540
this._onSkip( v, this._line );
519-
debug( 'New skipped row. Line: %d. Value: %s', this._line, v );
541+
debug( 'Skipped row. Line: %d. Value: %s', this._line, v );
542+
} else {
543+
debug( 'Skipped row. Line: %d.', this._line );
520544
}
521545
// Increment the counter for how many lines have been processed:
522546
this._line += 1;
@@ -572,13 +596,13 @@ setReadOnly( Parser.prototype, '_createException', function createException( nam
572596

573597
switch ( name ) {
574598
case 'INVALID_CLOSING_QUOTE':
575-
err = new Error( format( 'unexpected error. Encountered an invalid record. Field %d on line %d contains a closing quote which is not immediately followed by a delimiter or newline.', this._col+1, this._line+1 ) );
599+
err = new Error( format( 'unexpected error. Encountered an invalid record. Field %d on line %d contains a closing quote which is not immediately followed by a delimiter or newline.', this._col, this._line ) );
576600
break;
577601
case 'INVALID_ESCAPE':
578-
err = new Error( format( 'unexpected error. Encountered an invalid record. Field %d on line %d contains an escape sequence which is not immediately followed by a special character sequence.', this._col+1, this._line+1 ) );
602+
err = new Error( format( 'unexpected error. Encountered an invalid record. Field %d on line %d contains an escape sequence which is not immediately followed by a special character sequence.', this._col, this._line ) );
579603
break;
580604
case 'INVALID_QUOTED_ESCAPE':
581-
err = new Error( format( 'unexpected error. Encountered an invalid record. Field %d on line %d contains an escape sequence within a quoted field which is not immediately followed by a quote sequence.', this._col+1, this._line+1 ) );
605+
err = new Error( format( 'unexpected error. Encountered an invalid record. Field %d on line %d contains an escape sequence within a quoted field which is not immediately followed by a quote sequence.', this._col, this._line ) );
582606
break;
583607
case 'CLOSED':
584608
err = new Error( 'invalid operation. Parser is unable to parse new chunks, as the parser has been closed. To parse new chunks, create a new parser instance.' );
@@ -795,7 +819,7 @@ setReadOnly( Parser.prototype, 'next', function next( chunk ) {
795819
states = this._states;
796820
for ( i = 0; i < chunk.length; i++ ) {
797821
states[ this._state ]( chunk[ i ] );
798-
if ( this._state === ERROR ) {
822+
if ( this._state === CLOSED || this._state === ERROR ) {
799823
return this;
800824
}
801825
}

0 commit comments

Comments
 (0)