Wednesday, April 23. 2008Solr: Using the dismax Query Handler and Still Limit a Specific FieldWhile working with the facets for our search result earlier today, I came across the need to limit the search against solr on one specific field in addition to our regular search string (which we run against several fields with different weights). The situation was something like this:
We do the searches against the AggregateSearchField and the AggregatePhoneticSearchField, where we weight the exact match higher than the phonetic matches. This ensure that the more specific matches are ranked higher than those that are merely similiar. We do this for several different field groupings, but that’s not revelant for this post, so let’s just assume that these are the three fields relevant. We search against two of them, and uses Lastname as a facet / navigator field to allow users to get more specific with their search. However, while users should be allowed to get more specific with their search when selecting one of the facets, it should not change their regular search. And since the dismax handler will search through all the allowed field for a given value, you cannot just append Lastname:facetValue to the search string and be done with it (dismax does not support fielded searches through the regular query). After a bit of searching through our friends over at Google, I finally stumbled across the solution (which I of course should have seen on the Solr wiki): use the fq-parameter. This allows you to submit a “Filter Query“ in your request, which will be used to further filter your existing query through another set of queries. This fits very neatly in with keeping your original query and then appending filter queries for each facet limitation that gets set. A small code example for Solrj: (filterQueries is a HashMap<String, String> which contains the facets; filterQueries.put(“Lastname”, “Smith”) will add a limitation on the field “Lastname” being “Smith” (you might want to escape “-s in the facet values)):
So now we can just parse the query string for valid facet limitations, and set the fields in the filterQueries HashMap accordingly. As we already have a list of facet fields to include, this is a simple as iterating that list and checking for the parameters in the request variables. A great thank you to Mike Klaas in the dismax and Plone thread indexed by nabble.com that sent me in the right direction. Saturday, April 19. 2008Solr: Deleting Multiple Documents with One RequestOne of the finals steps in my current Solr adventure was to make it possible to remove a large number of documents form the index at the same time. As we’re currently using Solr to store phone information, we may have to remove several thousand records in one large update. The examples on the Solr Wiki shows how to remove one single document by posting a simple XML-document, or remove something by query. I would rather avoid beating our solr server with 300k of single delete requests, so I tried the obvious tactics with submitting several id’s in one document, making several <delete>-elements in one document etc, but nothing worked as I wanted it to. After a bit of searching and stumbling around with Google, I finally found this very useful tip from Erik Hatcher. The clue is to simply rewrite the delete request as a delete by query, and then submit all the id’s to be removed as a simple OR query. On our development machine, Solr removed 1000 documents in somewhere around 900ms. Needless to say, that’s more than fast enough and solved our problem. To sum it up; write a delete-by-query-statement as:
Thanks intarwebs! Thursday, April 17. 2008Using Solrj - A short guide to getting started with SolrjAs Solrj – The Java Interface for Solr – is slated for being released together with Solr 1.3, it’s time to take a closer look! Solrj is the preferred, easiest way of talking to a Solr server from Java (unless you’re using Embedded Solr). This way you get everything in a neat little package, and can avoid parsing and working with XML etc directly. Everything is tucked neatly away under a few classes, and since the web generally lacks a good example of how to use SolrJ, I’m going to share a small class I wrote for testing the data we were indexing at work. As Solr 1.2 is the currently most recent version available at apache.org, you’ll have to take a look at the Apache Solr Nightly Builds website and download the latest version. The documentation is also contained in the archive, so if you’re going to do any serious solrj development, this is the place to do it. Oh well, enough of that, let’s cut to the chase. We start by creating a CommonsHttpSolrServer instance, which we provide with the URL of our Solr server as the only argument in the constructor. You may also provide your own parsers, but I’ll leave that for those who need it. I don’t. By default your Solr-installation is running on port 8080 and under the solr directory, but you’ll have to accomodate your own setup here. I’ve included the complete source file for download.
Download: SolrjTest.java Solrj and JSTL EL: java.lang.NumberFormatExceptionWhile working with a view of a collection of documents returned from Solr using Solrj earlier today, I was attempting to write out the number of documents found in the search. In pure Java code you’d just request this by just calling .getNumFound() on the SolrDocumentList containing your documents, which whould also mean that they should be available through EL in JSTL by calling ${solrDocumentList.numFound} (which in turn calls getNumFound() in the SolrDocumentList object). The code in question was as simple as:
Which resulted in this error message, which kind of came as a surprise: java.lang.NumberFormatException: For input string: “numFound” at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:447) at java.lang.Integer.parseInt(Integer.java:497) After digging around a bit and reading the error message yet again, it suddenly hit me: $solrDocumentList was being interpreted and casted to a List, and as such, EL expected an index into the List instead of my call to a function. I’ve not been working with JSTL for too long, so I thought a bit about how to solve this. One solution would be to do the calls in the Action and then just map them to separate variables in the template, but this wasn’t really as pretty as it could be. Instead I wrote a simple wrapper around the SolrDocumentList, which is not a list in itself, but exposes all the elements through it’s getDocumentList-method. That way we can access it in the template by calling ${solrDocumentList.documentList…}. I’ve included the simple, simple wrapper here. It should be expanded with access to Facet fields etc, but this should be a simple indicator of my suggested solution.
Any comments and updates are of course as always welcome.
(Page 1 of 1, totaling 4 entries)
|
About meI’m a PHP, Python and Java-developer currently located in Fredrikstad, Norway where I do consulting work and work as a technical lead for Derdubor A/S. My interest include web application development (PHP since 1998), the demoscene (since 1997) and I have a weird fascination for interesting problems and digital maps. Om megJeg trives best med utviklingsprosjekter i PHP, Java eller Python, og befinner meg for øyeblikket i Fredrikstad, Norge, hvor jeg jobber som frilanskonsulent og som teknisk sjef i Derdubor A/S. Jeg skriver også spillomtaler for norges beste nettsted for spill. Jeg har en merkelig fasinasjon for kart, utfordrende tekniske problemer, demoscenen og generell webutvikling. QuicksearchArchivesCategoriesSyndicate This BlogCreative Commons |
