small medium large xlarge

Generic-user-small
27 Oct 2017, 19:08
Mike Yao (1 post)

There is an example in Chapter 6, Part II about “Inserting Elasticsearch Documents in Bulk”. It uses Stream because of the large JSON file. The data is sent by chunk via Stream.

const req = request.post(options);
const stream = fs.createReadStream(file); 
stream.pipe(req);

And here is the quote from Elasticsearch document:

If using the HTTP API, make sure that the client does not send HTTP chunks, as this will slow things down.

So my understanding is the code of example should be failed on broken JSON chunk, but it works. I have gone through API doc and tried to understand more, and it still cannot find the reason.

Thank you so much! - Mike Yao

Avatar_pragsmall
05 Feb 2018, 10:11
Jim R. Wilson (104 posts)

Hi Mike,

Thanks for taking the time to comment, and sorry for the slow response! I was working hard to get the book into publishable form.

Based on the Elasticsearch documentation you posted, I think that it should probably work if you used chunked HTTP, they would just rather you didn’t for speed purposes. Chunked HTTP breaks up a large message and sends it piece-by-piece, getting a confirmation for each piece before sending the next. Elasticsearch doesn’t want to spend time with issuing these responses, it just wants you to send the whole thing, which is why it recommends against chunking.

I’m not sure whether request.post() would use chunked encoding by default for streamed HTTP or not. Either way though, it’ll work with Elasticsearch, one way may just be slower. Hope this helps!

You must be logged in to comment