Skip to content

Language Filter with Advanced Search #116

@jduss4

Description

@jduss4

While implementing chronam for the University of Nebraska-Lincoln's newspapers project, which involves several Czech language papers, we discovered that selecting a language on the advanced search page does not filter by language unless if you also put in a keyword.

For example, the following query on the main chronicling america site has no parameters except for "Spanish" as the language, yet it returns nearly 9,500,000 results (presumably the entire set of OCR). This is also true if you select French, German, or English without any andtext / ortext / phrasetext, etc.

http://chroniclingamerica.loc.gov/search/pages/results/?dateFilterType=yearRange&date1=1836&date2=1922&language=spa&ortext=&andtext=&phrasetext=&proxtext=&proxdistance=5&rows=20&searchType=advanced

This seems unintuitive to us, as it is very possible that users will want to browse Czech pages rather than searching for a specific phrase. A quick fix is to include q or fq=language:Czech in the query, which hopefully should not interfere with the keyword queries which use the selected language to search ocr_eng vs ocr_cze, etc.

I've added some quick code which uses the requested code ("cze") to find a language that the solr results will recognize ("Czech") and append a language query filter.
https://gist.github.com/jduss4/d4f71929fbcf946d1c64

This is the location in the current chronam repo where we have made the changes to our project:
https://github.com/CDRH/nebnews/blob/4ea06e3c3b4ed2e23e3119454493572d6c30604d/core/index.py#L453

The language filtering only for keyword searches may be expected behavior in chronam, but I wanted to make an issue of it as it surprised us and seems unintuitive. If it is expected behavior, then perhaps a change in the UI may make it more clear how the language dropdown is going to affect search results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions