small medium large xlarge

04 Jan 2014, 22:57
Lance Ulmer (4 posts)

Has anyone else been having major issues with cheerio? I cannot get it to parse through the rdf files properly. Some of the issues are with cheerio itself. I filed two problems on the github page with examples here:

They are slowly being resolved, but I’m wondering how the other readers are even able to make it past this section with the rdf-parser not working. Are you rolling back to older versions of cheerio? I started out using the exact one the book is using, and have since updated to the latest from the master branch on github.

This is what I have running the first command line test of the parser and the master branch of cheerio:

// node --harmony -e \
// 'require("./lib/rdf-parser.js")("cache/epub/132/pg132.rdf", console.log)'

null { _id: '132',
  title: 'The Art of War',
   { '0': 'Sunzi (6th cent. BC)',
     '1': 'Giles, Lionel',
     length: 2,
     prevObject: { '0': [Object], '1': [Object], length: 2, prevObject: [Object] } },
  subjects: { length: 0, prevObject: { length: 0, prevObject: [Object] } } }
07 Jan 2014, 22:55
Lance Ulmer (4 posts)

To answer myself, rolling back to cheerio ‘0.12.4’ fixed the problem for me. It’s a bummer the book didn’t specify a specific version.

04 Feb 2014, 22:58
Steven Harris (4 posts)

Thanks! This was helpful. I hit the same issue

15 Feb 2014, 23:28
Erik Olson (1 post)

I just installed cheerio 0.13.1. Per, the issue should be fixed with v0.13.1, but I’m getting the results that Lance reported.

Two things I changed to make cheerio v0.13.1 work:

1 - Modified the default parsing behavior by passing the below as a second argument into cheerio.load.

{ xmlMode: true }

2 - In the map function, pushed the text into arrays. Seems like a hack, but map is returning the desired text in addition to the array originally selected.

Here’s the modified code:

use strict';
	fs = require('fs'),
	cheerio = require('cheerio');

module.exports = function(filename, callback) {
	fs.readFile(filename, function(err, data) {
		if (err) {
			return callback(err);

			$ = cheerio.load(data.toString(), {
				    xmlMode: true
			authorsList = [],
			subjectsList = [],
			collectAuthors = function(index, elem) {
			collectSubjects = function(index, elem) {

		$('pgterms\\:agent pgterms\\:name').map(collectAuthors);
		$('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collectSubjects);
		callback(null, {
			_id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
			title: $('dcterms\\:title').text(),
			authors: authorsList,
			subjects: subjectsList


Which produces:

null { _id: ‘132’,
title: ‘The Art of War’,
authors: [ ‘Sunzi (6th cent. BC)’, ‘Giles, Lionel’ ],
[ ‘Military art and science – Early works to 1800’,
‘War – Early works to 1800’ ] }

28 Mar 2014, 08:49
Saraf (1 post)

There are two changes needed to get this working with cheerio v0.13.1

a. pass { xmlMode: true} to the cheerio.load function.

b. convert the map(collect) output using toArray()

'use strict';
  fs = require('fs'),
  cheerio = require('cheerio');

module.exports = function(filename, callback) {
  fs.readFile(filename, function(err, data){
    if (err) { return callback(err); }
      $ = cheerio.load(data.toString(), {xmlMode: true}),
      collect = function(index, elem) {
        return $(elem).text();
    callback(null, {
      _id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
      title: $('dcterms\\:title').text(),
      authors: $('pgterms\\:agent pgterms\\:name').map(collect).toArray(),
      subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect).toArray()
16 Apr 2014, 15:22
karnaf (1 post)

Another issue - using the sibling operator (~) only looks forward, so not all subjects are found. For example, in this case

  <rdf:Description rdf:nodeID="Neb3fdb95d6d0443b84aa52e258bd709a">
    <rdf:value>War -- Early works to 1800</rdf:value>
    <dcam:memberOf rdf:resource=""/>

The subject will not be collected as the <rdf:value> element is before the rdf:resource one.

If you wish to get all subjects, you can use cheerios sibling() function

callback(null, {
  _id : $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
  title : $('dcterms\\:title').text(),
  authors : $('pgterms\\:agent pgterms\\:name').map(collect).toArray(),
  subjects : $('[rdf\\:resource$="/LCSH"]').siblings('rdf\\:value').map(collect).toArray()
16 Apr 2014, 20:57
ge angel (1 post)

Thank you, karnaf. I’ve been digging for two hours for resolution and the use of .siblings did the trick.

19 Jun 2014, 10:06
git trac (1 post)

Package dependencies: “cheerio”: “^0.17.0”, “request”:”^2.36.0”

I go the err message

… /databases/node_modules/cheerio/node_modules/htmlparser2/lib/Parser.js:99 this._lowerCaseTagNames = “lowerCaseTags” in this._options ? ^ TypeError: Cannot use ‘in’ operator to search for ‘lowerCaseTags’ in [object Undefined]

Have tried all the above solutions without success.

Any suggestion? remedy? thank you.

08 Jul 2014, 00:36
Nick Pratley (1 post)

git trac, I’m using cheerio 0.17.0 and the above solutions work for me. Maybe post the exact text in the module you’re executing?

14 Jul 2014, 05:27
Michael Caudy (1 post)

Much thanks to @EricOlson, @Saraf and @karnaf for all of your suggestions, which when combined worked for me.

@git trac or any other still having problems, try using this complete package.json config, with these specific versions for all modules. Sometimes specific combination don’t work together, but these definitely do work for me.

    "name": "book-tools",
    "version": "0.1.0",
    "description": "Tools for creating an ebook database.",
    "author": "Your Name <> (",
    "dependencies": {
        "async": "^0.9.0",
        "cheerio": "^0.17.0",
        "file": "^0.2.2",
        "request": "^2.37.0"
05 Sep 2014, 20:56
Jim R. Wilson (70 posts)

Thanks everyone for reporting (and fixing!) these issues. I’m sorry I didn’t add version information when writing the book, I really should have. It didn’t occur to me at the time that they’d make breaking changes to the cheerio API, but in retrospect I should have expected it.

Thanks again everyone for figuring this out!

01 Sep 2015, 19:48
Vladimir Bauer (1 post)
.map( function(index, element) )

producing a new Cheerio object, so in order to get Array, there is need to chain a get() function, as documentation shows.

Though, @Saraf proposed using toArray(), I did with get() like:

$('pgterms\\:agent pgterms\\:name').map(collect).get()
You must be logged in to comment