04 Jan 2014, 22:57
Generic-user-small

Lance Ulmer (4 posts)

Has anyone else been having major issues with cheerio? I cannot get it to parse through the rdf files properly. Some of the issues are with cheerio itself. I filed two problems on the github page with examples here:

https://github.com/MatthewMueller/cheerio/issues/354
https://github.com/MatthewMueller/cheerio/issues/355

They are slowly being resolved, but I’m wondering how the other readers are even able to make it past this section with the rdf-parser not working. Are you rolling back to older versions of cheerio? I started out using the exact one the book is using, and have since updated to the latest from the master branch on github.

This is what I have running the first command line test of the parser and the master branch of cheerio:

// node --harmony -e \
// 'require("./lib/rdf-parser.js")("cache/epub/132/pg132.rdf", console.log)'

null { _id: '132',
  title: 'The Art of War',
  authors:
   { '0': 'Sunzi (6th cent. BC)',
     '1': 'Giles, Lionel',
     length: 2,
     prevObject: { '0': [Object], '1': [Object], length: 2, prevObject: [Object] } },
  subjects: { length: 0, prevObject: { length: 0, prevObject: [Object] } } }
07 Jan 2014, 22:55
Generic-user-small

Lance Ulmer (4 posts)

To answer myself, rolling back to cheerio ‘0.12.4’ fixed the problem for me. It’s a bummer the book didn’t specify a specific version.

04 Feb 2014, 22:58
Generic-user-small

Steven Harris (4 posts)

Thanks! This was helpful. I hit the same issue

15 Feb 2014, 23:28
Generic-user-small

Erik Olson (1 post)

I just installed cheerio 0.13.1. Per https://github.com/MatthewMueller/cheerio/issues/354, the issue should be fixed with v0.13.1, but I’m getting the results that Lance reported.

Two things I changed to make cheerio v0.13.1 work:

1 - Modified the default parsing behavior by passing the below as a second argument into cheerio.load.

{ xmlMode: true }

2 - In the map function, pushed the text into arrays. Seems like a hack, but map is returning the desired text in addition to the array originally selected.

Here’s the modified code:

use strict';
const
	fs = require('fs'),
	cheerio = require('cheerio');

module.exports = function(filename, callback) {
	fs.readFile(filename, function(err, data) {
		if (err) {
			return callback(err);
		}

		let
			$ = cheerio.load(data.toString(), {
				    xmlMode: true
				}),
			authorsList = [],
			subjectsList = [],
			collectAuthors = function(index, elem) {
				authorsList.push($(elem).text());
			},
			collectSubjects = function(index, elem) {
				subjectsList.push($(elem).text());
			};

		$('pgterms\\:agent pgterms\\:name').map(collectAuthors);
		$('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collectSubjects);
			
		callback(null, {
			_id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
			title: $('dcterms\\:title').text(),
			authors: authorsList,
			subjects: subjectsList
		});

	});
};

Which produces:

null { _id: ‘132’,
title: ‘The Art of War’,
authors: [ ‘Sunzi (6th cent. BC)’, ‘Giles, Lionel’ ],
subjects:
[ ‘Military art and science – Early works to 1800’,
‘War – Early works to 1800’ ] }

28 Mar 2014, 08:49
Generic-user-small

Saraf (1 post)

There are two changes needed to get this working with cheerio v0.13.1

a. pass { xmlMode: true} to the cheerio.load function.

b. convert the map(collect) output using toArray()

'use strict';
const
  fs = require('fs'),
  cheerio = require('cheerio');

module.exports = function(filename, callback) {
  fs.readFile(filename, function(err, data){
    if (err) { return callback(err); }
    let
      $ = cheerio.load(data.toString(), {xmlMode: true}),
      collect = function(index, elem) {
        return $(elem).text();
      };
    
    callback(null, {
      _id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
      title: $('dcterms\\:title').text(),
      authors: $('pgterms\\:agent pgterms\\:name').map(collect).toArray(),
      subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect).toArray()
    });
  });
};
16 Apr 2014, 15:22
Generic-user-small

karnaf (1 post)

Another issue - using the sibling operator (~) only looks forward, so not all subjects are found. For example, in this case

  <rdf:Description rdf:nodeID="Neb3fdb95d6d0443b84aa52e258bd709a">
    <rdf:value>War -- Early works to 1800</rdf:value>
    <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
  </rdf:Description>

The subject will not be collected as the <rdf:value> element is before the rdf:resource one.

If you wish to get all subjects, you can use cheerios sibling() function

callback(null, {
  _id : $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
  title : $('dcterms\\:title').text(),
  authors : $('pgterms\\:agent pgterms\\:name').map(collect).toArray(),
  subjects : $('[rdf\\:resource$="/LCSH"]').siblings('rdf\\:value').map(collect).toArray()
});
16 Apr 2014, 20:57
Glider_pragsmall

ge angel (1 post)

Thank you, karnaf. I’ve been digging for two hours for resolution and the use of .siblings did the trick.

19 Jun 2014, 10:06
Generic-user-small

git trac (1 post)

Package dependencies: “cheerio”: “^0.17.0”, “request”:”^2.36.0”

I go the err message

… /databases/node_modules/cheerio/node_modules/htmlparser2/lib/Parser.js:99 this._lowerCaseTagNames = “lowerCaseTags” in this._options ? ^ TypeError: Cannot use ‘in’ operator to search for ‘lowerCaseTags’ in [object Undefined]

Have tried all the above solutions without success.

Any suggestion? remedy? thank you.

08 Jul 2014, 00:36
Generic-user-small

Nick Pratley (1 post)

git trac, I’m using cheerio 0.17.0 and the above solutions work for me. Maybe post the exact text in the module you’re executing?

14 Jul 2014, 05:27
Generic-user-small

Michael Caudy (1 post)

Much thanks to @EricOlson, @Saraf and @karnaf for all of your suggestions, which when combined worked for me.

@git trac or any other still having problems, try using this complete package.json config, with these specific versions for all modules. Sometimes specific combination don’t work together, but these definitely do work for me.

{
    "name": "book-tools",
    "version": "0.1.0",
    "description": "Tools for creating an ebook database.",
    "author": "Your Name <you@yoursite.com> (http://yoursite.com/path)",
    "dependencies": {
        "async": "^0.9.0",
        "cheerio": "^0.17.0",
        "file": "^0.2.2",
        "request": "^2.37.0"
    }
}
  You must be logged in to comment